IXX:UIfE!rT RESITKE 



ED 317 580 



TM 014 629 



AUTHOR 
TITLE 

INSTITUTION 



PUB DATE 
NOTE 

AVAILABLE FROM 



PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Richards/ Llyn, Ed.j Croft, Cedric» Ed. 
The Best of "Set" Assessment. 
Australian Council for Educational Research, 
Hawthorn.; New Zealand council for Educational 
Research, Wellington. 
89 
108p. 

set. New Zealand Council for Educational Research, 
Box 3237, \^ellington. New Zealand. Set, Australian 
Council for Educational Research, box 210, Hawthorne, 
Victoria 3122, Australia, 
collected works - General (020) — Reports - 
Evaluative/Feasibility (142) — Tests/Evaluation 
Instruments (160) 

MFOl/PCOS Plus Postage. 

•Achievement Tests; Anthologies; Criterion Referenced 
Tests; "Educational Assessment; Elementary Secondary 
Education; *Evaliiation Methods; Foreign Countries; 
Intelligence Tests; Item Banks; *MeasurerBent 
Techniques; Nonverhal Tests; Observation; Scores; 
*Standardized Tests; Test Bias; «Test Use; Writing 
Evaluation 

Australia; New Zealand 



ABSTRACT 

This package contains articles in three general 
areas: items covering measurement topics; brief and practical guides 
on measurement techniques; and professional reading on broader 
assessment issues. The purpose of these publications by "Set" is to 
provide research information to teachers. The initial article, 
"Overview of Issues in School Assessment" (B. McSaw) , was written for 
this compilation. The other 13 items were all published in "Set" 
between 1978 and 1987. They are: (1) "Achievement Test Scores in 
Perspective" (W. Turnbull) ; (2) '♦The Foundations of School Testing" 
(C. croft); (3) "Test Evaluation Sheet" (S. Larsen and D. Hammill); 
(4) "Assessing What They've Learned" (W. B. Elley); (5) 
"Criterion-Referenced Measurement" (G. Rowley and C. Hacpherson) ; (6) 
"Investing in Item Banks" (N, Reid) ; (7) "Combining Scores" (A. 
Gilmore); (8) "Evaluating Writing" (D. Phillips); (9) "Observation: 
The Basic Techniques" (B. McMillan and A. Meade); (10) "One Extreme 
to the Other; A Report on profile Reports" (G. Withers) ; (11) 
"Non-verbal Tests in Schools" (C. Croft); (12) "Does Intelligence 
Equal Learning Ability?" (J. Jenkinson) ; and (13) "Test Bias! Test 
BiciSi" (N. Reid and A. Gilmore). (SLD) 



• Reproductions supplied by EDRS are the best that can be made 

• from the original document. 



00 




U S OtPA^THf NT or tOUCATIOI* 

f^nUC^TiDt^i PfMHif^iS }f>i*i-»RMA^»0\ 

Mtnrr t ^#f>p<p^ ^evP tM>fw m^attie tmfMp^f- 



-PEHMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 




BEST copy AVMUtBlE 




set: research information for 
ieactrers, is published twice a 
year by the New Zealand Council 
for Educational Research and 
the Afstrallan Council for 
Educational Research. 



General Editor: Llyn Richards 
Special Editor: Cedric Croft 
Cover Design: John Gillespie 
Australian Editor: Peter Jeffery 



Copynghf: NZCER/ACER 1987 
Copying / arrangement: 
Achievement ScofBS in PerspedJVe 
by Bill Turnbuil {Copyright held by 
Eauaational Testing Services, 
Princeton. New Jersey USA). 
Copying Permitted: Copyright on all 
other items is held by I^CER and 
ACER who grant to all people 
actively engaged in education the 
right to cop^' them in the interests 
of better teaching; simply 
acknowledge the source. 



Subscription orders, and 
inquiries, to: 

New Zealand 
set. 

NZCER 
Box 3237 
Wellington 

Australia 
set 
ACER 
Box 210 
Hawthorn 
Victona. 3122. 



3 



In this ^ jkagt* are pathof ed toq^^ttn^f 

■ Items c?overing measurement topics: 

■ brief and practical guides on 
measurement techniques; 

■ professional reading on broader 
assessment issues. 

We bec^in vvrtfi CVrnvrvv oHssncs \Sr'\ % ■ ' 
on them but fhv^y .ifr«tnf',i,Hi m 



Assessment Issues and 
Measurement Concepts 



Test {- valuj^o^' S*^*M'1 

Measunement and Assessment 
Techniques 



InvrMing iu n, ,.,v ^ 
CorTihinirnj S- I 

Reporting 



Prof lip ^t••|)or^!^ 

Assessment* Abilities and Culture 

Does IntnHigt'nt r l(|iKii l ^MrniPf; AS - *v 
O J Bjas^ Test Bip^<^' 




Summaries 



Assessment issues and 
Measurement Concepts 



Overview ol Issues in School 
Assessment. 

the fornix and thr u.vfr/?/ M .I'^f.i -s^.; ■^^ 

summafisinq fri-ifUs ju^ ^ 

outlined Ine fijrntitnrr^t.'j)^ of 'li^"; 

fOt^pons^/ thnnv ^I'V x o\x:{i\\ 

torhfifqtu^ o?!ivr> an f^!i>i;f nt ^n . ^ s ' v ^. 

Achievement Trsf Scoros in Pi*f sper tlvc 

T'ho^ur.tis -M^n t^!te<'' V 'tut*' - : '^■iv ' 

tests acCUJ.U y. ('t'lt'l.l V 1\ ^val 
CompiHabiilv vv'Kit^-nt 

the micfumeHT thr wn^it/ ^a^f s,; « 

n^ippfopfjatv ur.i's 

Foundations of School Tcstmg. 

CtHir^c Cro^f 

This ttpm OUtiinps .-ir*M * >r . 
clfc!mcntsof tfU' 4 ()!u. eplV'ot v./'>.7;s ■ ;.{ : 
con<:unent. pftMju f;vi i.:»fihlK;. t :» 

biJii! arouiui i'onun(^n ^.l.-iM.ia^ I 

and hke^ oxitn-^V'^n ,<r<f H*'- • ' 

in sichoo^s rv4iK ti itu: 

Test Evaluntlon Sheet. 
CtKific Cfoft 

IX^stgned tu act I'f^'p.jr^v :-i\.'ni . 
article, this sh^w t prov^U" ^^ v t r;)t^ ^ 
basis for evaHjatuui \hv \\uM\c:. o\ 

published U?StS riHii f >OSSlb!v M *^tm > U M{ 

made tests too, Aworkf»d*>xan{pio sfn>v^ATu^ 
O ntry for a published test is inciuchHi 

ERIC 



Measurement and Assessment 
Techniques 

; Asscsi»ing What They ve LeaTiied. 

:in{\ V>UU','\ » ^ \:* 

i 

: Criterion Referenced TostH. 

•> t. *t!i t . ] V 

Investing in Hvw Banks. 

Hf-f«i f/a^>H :• IV*' .1" >^ 'V;''A iii ^'W 

ui r.M 'r>^ if 'Ki' V'' a V} . i , va ' 



^ Combininci ScorcK. 

\ ^^^'^ r ^ aifft Hr;\ . • ^a la? ;-.* ■ 

t ; • 'f'pM' 'N- rV VV!^'-a *V)>f . -M s- 

' M;:iy la/ !■ nna'^^ t'c *niilrati!* fv ' VVr.a 

j !U i'nn-. aiir-t pe COniptmaJ VVarKi^ct 
i i?xafnpitfb show i\cMm pfu(,c?:..s 



Evaluating Writing. 



Pit' M *!\ 



Assi^ssmont, Abilities ^if^d Culture 



Overview of Issues in 
School Assessment 

bv Barrv McGaw 




Purposes of Assessment 

iicH-ndfs. Il t»in establish thr irvf! ot i\i\w\cn\vi\i o\ a 

sUnuiirds i)j wilh jvft M'ini* U> thv .nhfovonii^nts o\hvr 
sliuU^nls. )l \,U) mdhMv \hv .iicus in k'tirniiu: %inii 

tiwhing *in' in nwd ut nnprtn *Mtu*nl. I! t,iri jMtnulc- \hv 

Duis^uosi> 

fn m.snv thi» must iinport.jnt purpt^M* is dhiji;n4H!s. 

lJu* }%i)lH uLii niisviMurptiiMis ,j s!iit3rnl b.is d* \rU^fvd nnd 
thi*n sv^^k in tvnunv thi^n, Ihr task i>l diagnosis is mxw 
straii;ht!4»nvard in bi^blv sinuiitrod siu^jiHts siuh 
mathi'm*itu> but 1I is possible in all, so lonj.; .is iht* u%uIum 
IK litMr *ibout wh.<| Kno'vK*d);i» «ind skills .Ht» io Jo- 

SvsU'inabv^ iist* iH iJia>;nostH .jsst^ssnn'nl ^an jdi*nh!v foi 
tlU" lc«5iho» dotinonciov in toaihin^V paruvt,<lar tnisuni 
ct'piion isiPfnnuHi tiui asisonabU* numbtM tii stuiliMits tlu n 
»i tetuliing di ti4K-nv\ ami iw\ nisi a iiMrntni* d.Miin*ni \ fs 
indicated I his can alorl tho liMi bvr !o tin* ntvd U ♦ i,oiuTal 
suppli»ini*nuirv instnuUon and lo tht^ not^i^ tor a t^jllon nt 
instnutional paltiMn wilb fuliiro rjiMips al sUidonts 

Pvtcnniuins^ l.rrcl< of AchiCiVffhnit 

Iho si»< ond broad piirpost* o\ asscssnnM*! to dotrrniini' 
till* h*vt>K of ai himomonl it^ individuals or i^nuip-^ f hat tan 
K» done ut t-ither o! \\m* wav*-. ('nio appnwh nsnaiK' 
iiilK'd 'rnttTion-n»U»rtMHt'd' assossn>on! iuHaitsp ti nHoht^*^ 
thv di'linition ot staiuiards ai hir\ cnnMil as t rit^'rion 
against which perh^rtnanivs an* |ud);i»d I hoiUhrr apprwatb 
is usihillv lallod 'niun^roU'rvtuvd' assossmm! K'caiisi' li 
invohvs loinparisim i>l p^rlorinamt's with tho avt^iap.o 01 
norm achj»»\iMiient ot somo rt^tonMuv j^rtnip t>t studtMits 

Thos4» apprinuhi's appoar to bo lutulanu nlaUv dinoiviil, 
ono using simio al^olutf sot ot standards as tho pmnl oJ 
noforonti-. ihi- 4>thfi using actual iuhiovomonts i>{ oihor stu 
dents tor a>m/>i?/7*u5f?. hi tail, fhoro js s4Mno uvorlap. in doim 
ing trilorion IcwK ol aihiovomont, sav tor inafhoniatiis in 
VtMr 5 (Standard 4) in iho primary sthtKi), Iho choin* mitsl 
Iw basinl t>n s4>mo cimstdoration ot what it is n'asonablo fi» 
exfH'ti ot studonts nt this ago and o\{vnomo and thus on 
Honn* considoration ot what thov iypiialH aihu^vo h^ this 
cxlont, tho critoria aro norniativolv itofinod, 

VoT ntirni-roforontod assossnu-nl, the ohoico ot roti ronii- 
group is iniptirtant. Tho nu>st nMrictt^ti oaso iiUi^ivi s Hu* 
iompiirison ol a studontV tichio\i^nu*nf unlv with th*U oi 
others in the same class. If the tu hievements iw establishui 
^ a particular te,sl pn»}%m*d by Ihi* class teacher »ind ad- 
en ir"imirfeit*d only to that class then, ot aiun^e, no mon» is 
Hi^^N^ssible. The result, however, may bv quite misleading; if 



the \ iass is ats puai in an\ s\>n lot ovujipio. \\ a sttutont 
i*- di I laiod !o Ih* KHov^' avotago or ul li»u tank m a ^ )ass 
thai IS a*^ a u hoio wi»ii ahi^xo a\oMi!o »t uoiild ih» wrong 
lo v»nu]udc that thi siiuiont is jn an. goni^ial nav 'K'low 
average \\ithtnit ovidi'ncoo! the high staFiding ot the whole 
i lass, ilio sUnionl. the studonl's p.nt ntsand own tho teachor 
pisW no! be iible U» a\oj»J this nu orni! intt*rpri»tation ot tlio 
^t»rmal!u* assossmoni 4>l the stiidonts ^HTttmname, 

^\ idnn o abinit the genera] level of the sludenls and the 
dasss pi'jtonnanci's 4 an be obtained wtih standardisi^d 
tests. With such te^ ts. tho le%ois oi pv}i\mnamvs ot a stan- 
ilard gn>iip of students, gonuineh lepR'sentative i-t the 
n'Un»nii p4>pukUion. havo boon deiemiined. I luMnihvidual 
can then be compau'd tc» the population ami not just to the 
local class. Ono Hnpi>rt^i?M disad% antage ot this appnmch is 
thai the stanitardised test niigtit iwl matih tho teacherV 
uiNMiutiona) purposes as well as the liachers imn test 
iould. A \m\ tosf ts pi's is, tJu'ivtt^ro, mos\ vahie. 

I^^th nornot iMnparis4ins among sluvients and criterion- 
ioinparisi^ns v\ ilh delinod standijrds can be at hieved simul- 
lanoimslv 1 ho purposes ma\ he dilterenl but the strategii^ 
an' not as tiilti'n^nt as nia?i\' have supposinl Xonn-Mer- 
t^nci^i tests arc di^signed \o spnvul students out tii ostabhsh 
thi' nature ol vlithnences among them. When this is done 
If establishes also the tvpos ot !hin^;s that 4mH' the best 
aihu^xers tan ^io, those that tin* average students can do, 
*ind so on, Iho result c a In* an 4»ivu*rod st^t of triteru!. The 
104hnn«j! basis Un at halving sjkh an nitogralH>n ol ninm-n*- 
teu'ncod ami ifitt^rion n^iOToncoiJ assos*>nuMit is ontlnied 
iatoj 

fiiii^iut^ the Quii^ of Irtnnin^ ivul Jhiihiu^ 

\ hi re IS anothof distnn tion in purj^tsi* that vuts acn^ss the 
i>no *:htMd\ madr I hi^ distanition was tnst mtnuHuvd in 
thsi ussit»ns ul rrituhiui ovatuation H is tho distmittiin 
bol\v4*en tt^miiitne and suinmatue assossment. The pur- 
pose 4^t U^vmiUir assessment is t4> nnpnwe toaching and 
(t\uning. I hi' pnrposo ot ^-ummMvr assessmont is tii ci>rtify 
}e\o!s ot aihievt^ment in snnu^ tuial juijgment on a student's 
progri^ss at a {MrtJt iiiar stage oi odiicatiim such as in an 
ond-i)t V4\iT ivpurt or an i^nd-iit-sduuiling ctTtitjcate 4>r in 
stuno final judgnu'nt 4>n the teacher's pert4>nnance. 

Diagnostii. i ritenoivretiTonced and nornvretennicid as- 
sessments ian all ser\e bi^th ti^rmatn e and sumnuitivf pur- 
poses, f or lorrnatiw purposes, howtni^r. M must Ix* said 
that di.igot^slK and critt»ruin-n»ton*nc4»vt assessment are 
m4ne relevant . Ihev uJ4*ntiiv nior4* 4 UmHs the itetuiencies 
that need *o in' dealt with 

Necessary Properties of Instruments 

All ASSI SSMIM IXSl in Ml must satistv cer- 
tarn t4Hh!nial ret)Uiroments it ihev ar*^ t4> at hieve their 
pt4f}>4iM»s Hie traiiitiona) expressuin o\ tlu*se tecjuiremenls 
w ^s ti^ sav that instrunn*nts must Iv valid and reliable. Hie 
validity critenim remains mutti .is it has Ihhmi Un smvral 
di»cadi's. Th<» r4*hab)litN criterion !ias bt*en n\ast in impt>r- 
tant ways 4»ver the last dec ade or so thr^High sonie sigmlticint 
developments in psychometric thtn^ry. 



VaMhf 

Ccintcnl wihdiis ui'p'tu!^ on Ihi' .nf<-i]ihn\ \\ \lh >%hiji 
llu* pvitt>n^tanti*s ilrni^nnhii in ihr fr^( jrr u f^r "-rnl»n;\ * 
tht> kni»^% iiHip» .uiii skiiK in tht^ doin^m to be tiN^^ s,,'^! 
I his u'ijiuu's lh,i{ iht' lUmi.nn In- vlrt nvd Ih.ii itu m^ih^Hl 

tMisiiu* that it t4>nvsp4Muis With ihv iunnuluo) i-niphas^v 
in tlu^ ifilfjuK'tf u.nn. In phuikc, mans <t%uhvisan\i 
iPiiCi*rn ihi*in*-vh i*s t»nh ui!h ilw Uuv \a!idil\ iti an inslju 
mm}., askmu onlv \\hrtii*M i; U^nks lu' a^->i.'^^\n\\ tlu- »ip 

t>iterum-ri»lati'd valrJiK di»pi»nd*- on Mir .uitHjiMCN wilh 
whiih W>1 ^ii<rc> 4an bi* um*J !%» inakr uilruMUi"^ aKni! an 
MUiJV^dual^ pn»babJi' •^lanJiru; t^n mmiu* itlhm \anabji\ Ww 
iTiiciinn !t thf i I'Wcnou ?s sonn' bituii' \'vvl ut perlornunuiv 
suvh as u'siih*- ni l!u» !nM M ar mJ hii;h<M ^iiiu atuMi and !hr 
ti*sl \s 4»f iuuvn! prjinrnianii'. smh a^ ^^^n 1^ iloini 7^ 
jss^^ssmrnls, the M^nrKjlinn bclurrn HuMuoM fvu! ^kint'^ 
i*- a ini'avuvc thv i'i ^WUi'}^■^vUWli \ .)]nh\\' vi the ciinvjU 
assv'sv.ni<'nl»- hi thi*- taMV it van bi' douibril as f'^rj/^f:,'; 
\alidHv hi i>1hot taM^^ thi* iurnvin niav iu- u jfh hndin^', 
vihtwiii \va\ ot a-.si*->s!n*; ruiivnl flatus ^Mtiioiil durvl »n)d 
i^IahtMati' as*-i^sinonr Im t*\ampK% a U*si ni.n K' iirM^ioprd 
as a i|iiKk %va\ oi idrntilynij; !hr rvlrnt ajul natuti' t»i a 
sludiMit s tt'adini; diltuuiiios In lhis.^ ,iM*- thr v itrn-iainui vit 
M.ori«N 4#!i Un* U'>t ui!ii oihrr tiUMf rxtrn^nr ^^^si's^nnn n!^ 
ol Msufinf; pt^fJoinianir a int .jsiiir 4*t i litrtiou u !ah'd 
salidnv. hniMaih'd vt^fi \a!idit\ 

timslruct \alidiU vh-ptMuis nn ih, i'\UMii h» sxhuh 
tnslnjmrnt fitiMMnx's thr thcnti'iuai ion-iiin t il uih^iiit'd 
U»fiUM^in\* .'MluHMfiitah i»nslrtu1 i^an idta u^i d hHxpl.nn 
oi tM,i;antM' kni»\\ Itdr.r ItMinn ^tuh inaUnniatua! apt 
Hudi'', an\H'U' and usuiini; irtufuu'sH' an^ !abt*K tiM -^uth 
ionsiniils f vidiMiit^ 4i| ci'nsinirt \aJkhU tatnn^i bi pt*? 
\id4»d !n *-Hi^;ir iimi'laluMi ujfh sunir olhvt nuMMUf ni 
bv s\sfi'nia(K .inah^jv thi' icu^ti nl of ,i ti*st h drpi''\N 
nn an ai v unnilJlniv; biulv o! loMMjih on Ih*- Inpu . 

Mo^i o! Uu' ih^bah a!>oij{ sJuHii a'>st ^-^nu^n! is alnuit • .il 
idHv Jht^M' s\ho I nfu JM' alU*fnf>lN ^^»miM^1ilM thi» prnonn 
anii» of sihool^ *n inhnahonal *.s su>fns uHh .\sU^m uidr 
testing; pHH^iamnu's, fm i*\anipU\ ai'r u-^ualK ct^ntrfjUHl 
that thr fiu tis lit ihv \v^\v,}\\ js tov» n.<, h>v\ }nd |.» n^fir^ ! 
\hv luii !iin^;i' ol ob)i*<. tivi^s |m uhuh thi* rdutational vHuH 
is diiiHkHi Ihtis4» uhi> irMUfsr rMrrnal i-xaniinalioi^s jtr 
usually ii^mr/fii'd tjbt>ul faiiuu'snt iiu^ oxaminatuins UHi-si 
an .tdiajuatr sampli* o1 Uu' p.'rUnni^nui's n\juui^i b\ Ihi- 
unrii Ilium and fhnr pn»prn-.H\- to olht i Iv^^ \i U \.va\ 
!hjn^;s suih as iap.h il\ In wi^k ijnu Klv nndf! pirssmr 

IraditUMiai di^i us^jon** ol Uu' pii i is!o!i u iif^ w hn h an jn^ft u 
nu'n{ nuMsiUi^s haw iut n tasi ^n U*nns 4»j uhabiJHv 1 h« 
iiHTt'latit^n h4M\vi»i»n smMi s i^blaHUij bs a v;;oupot studrnts 
nUMsuivd hvitr with fiu' s.mu- trsj is dt'tnuvl (i» i>r l!u- 
ti»st-rt'ti's| rHiabiht\' ot \hv Irsf Iht- tomlath^n brfurrn 
Hi»ri*si)l a ^fuup o\ sfiidi»nfs nirasiui-d uilh f\so lomis 
Uh» stnn^ h^t I** dffini»d its !hf i*i|invaitMit forms rebabihh 
Thf iom'iations botvvtvn storrs oi^ ail pans t»! itt rns n ilhui 
„ U'st an* aivuniuiaktl into a sin};le indtA intiTnal t imsis 
mt^y which is the moht iimimonlv usisJ m^v\ t»t reliubihtv. 



It pr<n kU»s art miks ol how woll thv tontpo n^its ot ii U*st 
nioasutv \hv s n>io thnu: 

\lj i»f lhov.r (ndivr-^ ol lohabdMv aif tiuhr<-4 | i^^^'astnos of 
ihi' nu%?sinonionl I'lroi wtih uhuli a-^^-f'^NnitMiN .'1 Midi 
\ iduai piM ioi nianM'^ ^hi n^uio l hr% .ur dnnoti h«^ni a 
snnpU* niiaihi*nu$!Haj niodr! oi ^ u*-.l siou* whuh assets 
that Iho i>b^i'i'»i\i h^4»u oI a stn^ii nl u^tu ^is Uu* ^tad4'ni's 
Uno auij .Ml I'DiM o1 iiuM^iHt'nuNH \%h^ih m.A bi- 

|'»OHUno OI iH r^.ifn V ^ hir pioMom i-. Ui.ii Um' ni. nto! a-^'-nnu"^ 
fhal ail nidu iiiuaK au' ?niMNUvod wuh l!a' -arm' prt^ ision, 
n7;ard!i'sv (^1 how iiirji oi low ?hi n si on- i-v 

ri»nionipo!arv trsf Hh^ota iMtoi^ osa? othi ^ >\/5v In Ihink 
vibk^ui Ihr bav{- 4«t si orrs ariij*M'\i on a U hv-travi 4^1 
♦ oiiwntia^nv.; on tola! irst ^^ ,*rr jrul l.)kjn j, 'Mo bo ihv Ninn 
ol Irni' *Hul ;hi ot tof s, ^^r. il uso*' a tno\jri oi uh^it 
happmis u lion an indtx uhu)! ansuois w suir.N^ Ht^n l o? this 
rr*ison i! js^.iilrii }h'in-rv'Hpon-.i' h'^l lhi'oi\ il Jakf^ mdr 
V nhials and }to?ns hi br imi a sjni;lr khni4'nvh*n llio mdn idu 
ah ak\tMdfni; lo Mu'ii jbilUv anii Iho Uoni*'*.^ lo.ihtii; to Uirir 
dilfuulU. It an donis thltjuiltN js ni'JU »U thr k'Vrl a 
pt'isons abilit\ \hv prison lakrn h» havo a piisbabihlv of 

0 (that N a Si) JianvO^ i^f tnisUiM'ini; Jho \Wm vomwh. 
]\ \hv iioni js abo\r thr pors4>ns .jb hh, thr fM is^n is taki'n 
t«> haso a proiMi>dil\ of k-sv than '-^ ol brnra iiH'u\t \\ i\h 
\hv proba{^i!itv dovljninv; Ihr hiithrr thr itrm is abosi- ihr 
j^t'TSiMiH abHU\. lor jUmus WUn\ l!ir prtstMls abihh thi> 
prrs^fH Js hiki'ti h» ha\r a proKibihU i>i i;rraUM ihan iV'^of 
brinj; <ouril, wttli \hv p!obabih!\ o^tnr, l!io luifhri thr 
itiMn iH brlow !hr prisons abflH^ 

\\v Hi'rd not konsitk'r ihi' niajhini,itu 4^ mi»dr( \Mrip]ovrd 
lo rrpirsrnl this jntriaihon in^lwi'rn prisons anil Hrnis 
ihr iiitts.iri rv}'»irssos Uir pn^babiliU o! a prrstjn answi'un.r, 
MMti'UK 4is a luniijon of Uu- piM'son's abihlv and thi Inn's 
diHH*i]!!\. Ihr onh j^onit lo nott^ is IhM In vonsjdrrinr, Ihr 
attual paltrtn t^l u«rrrU and }furnf4.l irspo!ist*s i>| ^ ).;Tonp 
of ind:\ iduaU a srt of Honis, jl i*- jn»sNil>li' to i^siHnJti* thi' 
ai^iffx o! <\n h pri son »Hid Ihr dithv iih\ ol \%n h iltMii f u hrl- 
ninfr il *s possji'^U' torsljinali* iht^ standard t»rn>i v\ ilh , hi< h 
iMv'h 4'sinwat^ .»i abihts' and dMfHnils is madr Ihrir is nti 
nrril iMa-^sti^^U' lhai Ihr pnaasion is tlu' s,:nir ttM all j^'fsons 

01 a!) iirtn^ i In^ p\^ui is fllusnaJrd in thr ! i>;nn I. 



Ability/DiffkuJty Cuntinuum 



1 



hrin^numlrr 
diffiajliy 

\u>m J. 
lit nri 



liTsona 



r.jsv 



Wiih tlusappnunh to thr i stnntihon **! pi-fsons' ,ibf]ru*s 
and itrnr^^ ihlfu nllirs. \\ is p4>ss Mo to o uun a Jrar drttni' 
tion of f)u'sifMh!\ iljlluulu i on!nn)uni Hh* 4^nj4 nnit; i>l \\w 
lU'ms bv ditlicullv tan hrlp a trai fin undi»rsfanii thr rekilivr 



k*wls trf ifttfmmt titiks and can Mp mlH Ihe dt^finition iif 
crUmti for in-iU'rion-n^ft'nMuvd *isH4»Ksnn'nt TIh* livntiun of 
t^tndenb * \ iibililv on llu* hatm* ^alv pnnaU'H, Nli»inj*i 
If inu'h it I * sivm, a dmn t 4 nl4Ti*Mvn"lt'n tui'd a*»M'v** 
invnl <)) !h* ;viinrin«int I"-. At !hi' sinto Him\ ihr mo.isuir 
ment alkms h^i inmpan«*4)ii .Hnon>; Hludt*n{H .iiuj .1 tin 
lovis ot piMhuniijniC nt fi'it'n»ni4' i;nnip^ Ii4!vi' Inu'n v^i^b 

i^t^n 4 rilt*ruMv H»U'irMi rvi and mnnv w\v}in% k\\ ♦in^i *,*, 
numf, ifiK appio.ivh iiUuUN t ilhtM t^^ h4»lh inuh ilnkm 
.HuMiiini; in tlu^ liMiinM s or &v siml* nl s H i}uiM'mi'nK ill 

iiUoi 'laln^n ,ilHnil U'sl piupi j iu*s and M^mii indii uin/jl it^st 
si'Mvs in ttMHis i»t iliiH jppuMiii. I liN^ii,U it'si iluH»r\ \\ \ib 
HNn*MN>n>o) rchtibflih .jniKtuiinu^iu ni^r^ i>{ hum- uiimiumiJ 

i>Hois 4111 in!r\;r»)tiun i>f !i4inii u'li u ni i-il ^nui LriU rii^n- rolrt 
iMUiHi jNSi'NsnuMil 

Forms of Assessment 

Informal Ob<^crvathvi 

TfAtHIRs MAk! IMOKMAl ^S^!SS^1!\is iU 

asks ,f i|iu»sfjtMi. ihr tU^jNUi rv .110 ina!ti.iti»d *i JiH ts?i>n 

nit*nl. fu.nii'p! fhr ir^jninM* .is r\ uUtu r 0! iHuli'isinniiinv; 
ond 4 iuitituu* f!i4^ inslnii tinn, di |4t .u 4 i*p1 \\ as i-v ul4Mu 4' 
lit lat k tit iir J4»istjndinr, aiul si in u piMf m iiiin ru isr u»in 

! hi*- ivpt* asM*ssnu*nt is an t'sstMifial • U-nn^nf ut al! t^tuui 
U%ahini;and shtUiLl nni Ih' uinli ^ v»i]ii«'vi sim^'h InuauM* tt 
IV n.^i strut lUH'd ,inil lonnjl \v w^v \\ \\v\\ n\]i\nv^ ^nnsui 
t»r*j{''l4' prolrsshMia! »-kili jise H uisrh u^jiun s ih,i| it i^^i 
b«* tin* tMii\ hwm t^l assi'ssnii'nt iis»>J 

fiM4b4M**4an hiui !t ilillu nil In au nnnilah* ati mi-nill asst»*,s. 
nu'nl Mt prt>);H*N*. irom lb*- tjursitons .nui an^-ui-r^^ ibal 4H4 !ii 
in ihv vbb aru! Hmv iM insijUiUoh. Mojr lormai atui ^Ir^u 
tiii4\i a>sr**smt'nf** lan pto\ mU^ iht* <>\i>rall \ iv\\\ in .uhn^w 
this U^aihiMs tli'sH^n lUui atlniinNtiM t^'sts' \hv tiwii mifjit 
ir vnl\iU4Mnpli*Uon t»i piat lu al xsnik in l.iboraUn \. solufiiMi 
4»1 pfi»Nvfns wiitii^i', .in i ss.iv uu lass U ini^bt unitK4* 
similar \V4>ik 4 i*mp!t*ttd al hiuiu^iM undi'i lAainiiialiniitiHv 
diiions Simw U^sls will (»b|i'i(J\r in ihi' m'!1s4' thai \i is 
tliMf what 4iMisli!uti^s »i unrvK I i)nsv,^>r (^tbi wtil br sub 
i^*iti\(i* in thi* srnsi' ihaJ !h4' !4%u Ium luusj i'm u |iuj).;nn'nt 
in di'tiMminini; tin' atlrquai a nf fbi* tr^ponsr. \t'!lhi'i hum 
is aiiiH|iMlt' aioni* 

Lxhrfutl .4ssrss^;/t'?//s 

In sonn» t ir4 unistant 4's, ^s .i^ssmrnls ran bv bAsni 4)n tasks 
nt>l iU'si^ni^d b\ \Uc induuhial U\hbiM 1 hi- It at in ( nws 
ihiH>sc tn ijsr a stanijajih%4*i1 tost 4MdiT !4» V4>nip,Hi' 
iUhirx ^Miirnts i\i stutl4'His jn htM dass uith tba^%' t>t a h^U i 
t»nn» gr4nip lUv ihokv nl sj.indarili*Til u*^1 ^hinild ii-i>n bi* 
KiHcd im ihv U'U'^anir 41I ds nmU'nl an4i thi^ appn>pria!<'< 
nfss4»f thi* r4*lrrtMKV)4nuipun whit h Iht^ nHrniati\t'inhirnia 
turn has Ivi'n vMabhshftl. Hit* manual \m ihv tfst **honkj 
pn'vidi' rxplmt inltirmatutn alnuji the saniph' 4»J pi i^nns 
v.hosi^ piTtnrniamt's pnnjdt* iht* norms, makf tioar luM 
oW the data arc. and gisv evidfnu' 4»f thf rrhabihlv and 
<hf pn^ifsum 4>1 tht» test, 
Y^Jn IheciSstM^t publU eViimmations, thi- di-si^n til tin* test 
^ mrt in the hands^ irf the teacher Kit iidetits lo »! pit Wit I v 



defini*d currit uluni is al le<iHl inlendi^. In Aii^tf^lui public 
4*\aniin*H!ioiis remain onlv M tiieend nt siV4indarv i»duc«iti4in 
u !u-re the\ siM ve in thi* si^U^tiiin oi sludi nis Iim admtssinn 

tu hn^hei etiui atu»n in \4 u A'al.tnd thiM** aie ntm pui>hv 
4 \anisnatj4ins m iht* thjid and litlh \%\u oi S4*4.nfu1a<\ eJut a* 
lu»n. uilh an t-^l^'Tnallv niini*MaU*d tt^lHuatt* in Ihv ftuirlh 
WW. hi thi^ IViiled Kjn^d4W a n4*u tijumn^n auru iilain is 
bvmf, di'lint'vj Uh priniar\- anv^ s4n tnH!*a\ l iiiuatittn antJ 
pn!'*!u .i*-si>-,snifn1 pnnt dun-'- an' In^ni; piannt\i varunis 
^idJi' U'\ris. hi Iht* I S A uitiunil siuh a lUiniuhnn diiini- 
tti^n m4M4* i:ini\*ta! stanijaidise;} trsts sxrv ustni. Vhv I'fS prai'^ 
fit 4's and \bv I K di^xelopmenfs relltH l an 4»\ti»rnai inipoM- 
Hon ot ass^>ssni4MUs pnn uir pitblii tt^piMts im \hv Mbivs v 
nirnts oi nhi^*)^ .ind !he i^diuittiona] s\sff»ni as a whole 
Pn^ssovr Joi ihi- knn^ 0I osNi's»ani'nl niav \\v\\ }.\u*\k 01 
.'vusjiaha arni Nt^u /tMlajul as \\v\\. 

Content of Assessment 

Tin iO\ll \l i)Y AbhISSMf MS van abvioush be 
d^ st rib4Hl b\ the ^ubjiH t vm iMtui 1 hi if an,- ii'sjs nf nsid- 
in>;, of suinu i*, of maihemalii^ and sn imi. A intMV f;4MHMal 
iii'svriphon oi i ^inlen! tan bi' punuiini iisnii; Ihe tvpi* 4»! 
outiomt' 01 prin t^ss brini', a^sossi'tl 

lilt* m^M MMiinii»n tinins oj .issrssnit'iit hniis on ontt i^ines 
Mu\ \hvn onK on ^4^gniti\4^ jnUv4nni's ItMihiMN mv iisimIIv 
iU'an^s! abtUit Iht* inlt^lU'i Uial skills that tht»v are stn^kni^ t4'» 
tiesilop. thtnjv;!! thi^s otttMi build It'sts whitl^ C4>fiventrate 
on tilt* retail ol inttMnuititui onh. Ihgher tndi»i toe^nilive 
skilis art* Hnf.onan! in tt'achinjt; and als4^ ni assrssmrni. 
U'^ts >houtti nniuni' mon* th,in nHaH todua) jnlorniiition. 
!ii4'\ must aki^ leqinrt' thi* iisf i>t skill'- suih as making 
jnftMiMUt^s analvsjnv;. s\ jithi^sisme, and r\ah]atm\; 

P<Uih0}U0t0r Out {'Office 

In niiin\ t astes, Ifirri' iUi* inipi^rlanl ps\i iioni^^hM obftHlm^s 
uhuh siioifld bi* thi* st^bfeii ot assfssrni'nt *is W4*}! as t^t 
liMiliini; l.aKnaior\ prtHetiurt^s hi stit-nitv tjiMljfv ot 
h*ihnu]ui' HI a?K h't bnoUn;^. vv<>in1work and sd <in art" alf 
mipoUanl .ind u»U'\ant t ntiM ia hn |ud);m>; pei UM miuneant'l 
ail n th it psvthiMnohH 4>i^f4'4 tis t s 

MtviUw ot^jtH !i\'t's K»tn bv nnnv pr4»bleinatu It 'likinj; lor 
M it'nt I' !sa tiMthtM s4>b|4VtJ\e h>T students, sh4iuJd a student 
bo jiiil^i'ti ne^vit!\ 4^lv j! tlial iuitt omi' is mH *u hu^tHi. reg*ird- 
U'ss nt thr U»\i i of \liv stiuh'nt's aihu*M'mrnl 4il 4)thiT 4^b|iH 
In I S ' Js It tniiovlnnattttn iky Ji^mand thai partu uiar attitudes 
jnd di**posj!!iuis bv devt'Ujpi'vt anwi displayed? S!i4»uld 
nUiM' and ps\ihoni4>ti^r perh^rmant i^s bt* ihe 4>nlv biscs on 
uhh h tbi* sfudiMit is jtid>;ed* fs the legiliniatv ol jadgin^ 
a stiident tltMror witti an alttHtivt* tjbjettfvo liki* '{tileranif 
tor 4'thnit di!t4Men4 4»s |his |s n<*t thf plato !i>en^»i};e in a 
iietaiK'tl di'bati*. I ht* p4unt is raised to makr t hsir that s; ho4>l 
jssi»sN,nient raisi s t'thiiMl as ua*U as ti'chnual ijUt*stunis and 
lo tMnpbttsist* ihi- pti'bh'malu natur4* of asM'ssmi*nt t^t tiHet • 
ti\e ob|04ttst*s 

/ Ciimiusi Pnuess 

KiHrnt d4*voi4>pment** ni asM-ssnn'nt in an alt4'rnatjVf upj'H'f 
oiularv pr4»i;ranm)e in \ h iot ta ha\ r stH^n thf ininHiut turn 
o! ii mw rniphasis tui ihi' karnin>; prtut-sM's thems4'ivi's 
as W4 >H as tin thi^ tuiu omes I he i^pproat h has bi*en lalvlled 
'^iMl-bas4»d' assessment btvause \i inxi^K es an initial negoti- 
atiiHi Ivtween ttwher and student ahnit the ^imIs to be 
.nhievt^d in the (articular cuurMS 



the grate an rtalmiesib irf ccmtatit and «)cpecbd peiibr- 
manci* And abo $^t«?ments of thi» dcHvHy di'?4ignt*d hi 
achiew the perJitrm^niv. In ain FngUsh unit for e%*impk\ 
Uie glial will 5pi*cify n uwkU^id mdnv v^ onis or essa\ s 
In a ivrtain timi'). a siandiird (cU\ir c\fH>sitor\^ writing), and 
a pnH.v»» <dt*vrlii}:*Hi ihnnigh succi^hiw dni«Higs, curriv- 
Ikms, inlittng and ivwritingK). Satl?il*Hlirt*\^ |vrhmn*ifuv fi»- 
quinL"^ ^iti^facHun of ill! *imptnis. 



Summarising Assessments 

FOR EACH INDIVrPUAl. SaiPIiNT, tnam assissfnonlN 
can be i>btaimHi within individiui) ciHtrM»s »Mid acu^Ns 
cmime?^. Hmv thi> infomiatiiMi might summ*iriHi'd is .in 
impivrtantqiu«sHi»n. A IncidHionai appriMvh h»)h Iwn In jmi- 
d«oe snmv aggrvgatc ur m'ragc as a suminnri^iin^ jilatistic 
to which a!! the information might In* n'diimi- iiediuing 
the n?fiults to a ii^ingk* index Uki thi>^, vi u ursi% Iom's iniu h 
information. An altitnative is to n»l#i:n at least m^pu* sejMratr 
mea^^ufe^ and to pn»M*nt the information as »i prnfiU* ol 
m&ult^ thus pre5i'r\1ng delaili^, 

Aggreifiates 

In somv Australian states the most pn>minent use of a singk* 
aggre^i^ate a*i a Humniar- of piTtomiance is in prmiucing 
tertiary entrance scones. A student's asuHs at the end 4if 
seamdary i^iucation, in wimc number m subjivts, arc aggn'- 
g3ted to pn>dua» a singk* sci^rv*. Fn>m thiM* an overall rank- 
ing of candidate!^ for admission lo higher i*ttucalion is pnn 
duanl. The adduction uf the- infornwtion to a single aggn** 
gate and a single order of merit is {ustiHed on the gnmnds 
of the aimpetitiveness with which access iu higher educa- 
tion is sought. 

What is lost in the aggregate aw the multidimens!.nis»iUtv 
of the original data. A student with high siim's in 
humanitieN and low sconi*s in mathemaiics and siience can 
obtain the fMme average as a student with high scores in 
mathematics and scienci* and knv scores in humanities and 
ati another <^tudent with awrage scones in all. (iiven the 
specialised natun» ot study in higi^»*T inlucation, it makes 
sense to consider these applicants iH|ual, One is much 
better pit^paixni lo study science and another much hi-tter 
pflL«|:^sird to study the humanities. l>neof the mori^^Hnveriul 
ai^gumenis for ignoring these difienmces and |x»rvisling w ith 
a single ag*^n^gate is that the alternativfr woukt rei^uin- ad 
ditionai constraints upon subject chimes that students 
might make in upper wcondarv' tuiuciitum. The use ot an 
^gR^^te, without amsideration of the mix oi subjivts fmm 
which it is derivtni, is sfvn as a substantial and ini^n>rtani 
wncession to freedom of choice in seamdan' education 
The kiss ol information in the n*duiiion nt the assessment^ 
to a single agga*g.»ie is then arceptinl as an acceptabk* pnct* 
to pay. 

Profiles 

Then? is a gmwing nMction against the use ot a singk* sunv 
marising aggregate or average in favour of a full pn^fik* of 
i^ultB. Rept>rting a prc^file of performances is b\ no means 
a new practice. Repi^rts to parent * usually pn>vide such 
infonnation and give separate n*su!t ^ for separate subjects. 
Within any particular subject it is common to reduce 'il 
performances to a single score, but i*\'en that is not nives- 
sary. Stmte primary schiiols, fw example, a'pt^rt with a miire 
fine grained analysis on corhpimenls of subjects suc^^ as 
English, preserving information about a-ading, writing, 
speaking, listening and so on. 
Even for admission to higher education pn>fi!es have 
^i^een used in the pairt. Admissk>n to university in fome 



CiHintr^ was chained thmu^ the achiev&Hntmt of 'mat- 

riailation\ 4tbtaini»d with simie minimum mix of n^ults 
such as two Hs and thn*!^ Cs, ll wa** iinlv ivhiMHitmpetition 
toi places bi\anie muiv intense atul the inHni .mis»' ttir a 
more pnnisi* ranking, or at liMst an appan'oth^ mow pnnise 
ranking, thai a single i?rder t»J fnent was prudoced on the 
basis of a single aggregate tiir each student. 

There an^ increasing demands toi a return fo the uh* of 
pn>tiles as a wav ol n*iaining M^ne ol the mbness oi the 
data which is \oM thnuigh n^diu titni \o an agjiny^ate I -ntor- 
lunatt*l\. much ot the argument tor pnnik's s*n s no iwotv 
than: more is Ivltet f hen* is a risk that extensi\e intonna- 
lion will swamp the user and tesuh in unclear and invpnj- 
ducealMe dtn j.^ions. Unless i leat diHision rules are made, 
ihi tv is iu> way n) ensuring that th» d^visiim lr»i an indi- 
eidual student wtit ni^t In' stmpiv tlie liiiosyncratic )udgment 
tit the persim u ho happens to deal swih the if- oiinatiim. 
Where no sekntinn is involved, t'le profile can mon* readily 
stand as an appmpnate rich report at performances. 

As an extunple <if the tyfv ot decision rule which might 
bv made ftn using a pr4>file tor selection pur^^^ses, nmsider 
a simple c«isi'. nntuce io a two dimensional profile tlie 
humanities and mathenv*ticv science sc4m»s; the two phnvs 
o! ink^rmation lor eacli student can then bv used in djf!en*nt 
wa\ tor ditiertMit purjxises, f iir admission lo an engineer- 
ing piogramme, siirne minimum s4on» on the hum*inities 
scale may he v*quirt^t and, tor all thirst* who s^ttistV that 
minimum, a rank order on the mathematics science scale 
might t>e estabhshed to determine admis-^ions. Tor 
tviuitmiics, whe«e I'H^th \erbal and qiianJitatne skills are 
axjuired, an average ot the tw<i results mav Iv usint. Vor 
admission to a humaaities pmgramme. a'sults ini the 
humanities Male aU^ne might Iv II sed , With I u rt hi* r pit*ces 
of information in the pmtile. the decision rules will fn-cinne 
more complex Init u iless thev are made explicit thev will 
not be a^priHtuceabL% Moa^ u ill be InMter onlv it it is clear 
what can tv done with it. 

Professional Obligations 

EVl-im»Jh WHO ll?ACHi:s has a protesMona) obliga- 
tion U* ••ssi'ss jM*rforniance it is tu'cessar\ to moriili^r 
the etftvlivenes-* of ones leaching, as well as lo nilorm 
learners, (and anv cithers respi^nsibU^ hn them such as par- 
ents,) ot the success of their learning. 

There is aisi> an obligitlion \o f»An ide asM'ssments that 
aa* lH)th criteriiMva*tea'nci»it and niirm a'teri^nced. Hie 
criterion-referenced assessnumts can pnn ide a clear indica- 
inm of what the student has learned eltivtiwlv and what 
is yet tn Iv mastea^d. Ihe nnrm-n*ti*a»nced asM^ssments can 
o!fi*r a i^oint of a^fea^nci* m average fvrtinniance that can 
give students 4ind paamts some idea of how personal f>t»rt<n" 
manie is ablated to general levels of exi>i'* tation, 

lo end with a warning. Si>me teachers pater io asst^ss 
pi^rformance against the student's Vapacitv'. This depends 
crucially upon the teacher's assessment oi lapacitv. U'achers 
may have ni* e\ idence of capacitv apart troin the ver\ }H*r- 
i mnanct»s that are the subject of the a^st*ssnient o\ achievt** 
ment. Sensitivelv UM>it, normative assessments can then 
provide a us»-{ul sijppU'nuMit ttM riteruni-reJerenced asM'ss- 
ments 



I>r Baay Mcl »aw is I )irf4-h»r oi the Australian i 4 uha if t4*r I duca 
tional Kese.mh. Box 2HI Havvtfiorn, VHlon«u ^2Z, Australia 



Coining Flermitted 

C* C opyright on this item is held by \Zl I R and At EK h ho 
gr43nt to all people actively imgaji;»\i in educatitm the right to 
copy It in the inteiests trf Wnter r^ actung. 



Achievement Test Scores 

in Perspective 



Achievement Test Scores in 
Perspective 



By William Tumbull 

Educational Testing Service. Princeton 



The need to assess how much students have learned 
has been fundamenla! in education fur as long as 
there have been students and teachers. Long before 
standardized tests of achievement cime on the 
scene, teachers were making such judgements They 
based them on informatian gleaned from famihar 
sources: direct observation of students' work, class 
recitations, conversations with the student or other 
teachers, daily qui^^es. and final examinations All 
these bits of information entered into the marks the 
teacher gave. They still do and they should. 

Since we had all these techniques at our disposal, 
why then were standardized achievement tests 
welcomed when they came nlong'!^ There are several 
reasons. 

An Inexact Business 

Teachers knew what an inexact business marking 
reafly is. and welcomed a new development that held 
promise of improving their information. Among the 
virtues of standardized tests, three were particularly 
appealing. The first virtue was accuracy. Studies were 
made to compare the amount of random error in the 
traditional kinds of information with the amount of 
ernor in the newer standardized tests in the same 
subjects, and the findings were consistent: the test 



d 14 

ERLC ^ 



Scores were cantsisttM^i!y rno/p aci^^urati' 
Furthennorc, they fiad u secuinl n^iatml vir fur 
olyectMty that helped ovc^rcome sojw olhrr 
problemjn. Sorne tracherr. niarktHl hard nnd h^pav 
mtnkiHi ern^y. some biased thc^ir mivks f»eaviiy on 
deportment, some on how hard students tried rathrr 
than how fnuch they ach^evucl Ihv- starulaidj/Ki t^n^t^ 
knew nothing of those things, nnr of sex. ^act^ 
honesty, nmuibility, or !ovo of one's fallow man only 
how much the student hao learned Third, scores on 
standardized tests of achievement had the virture ot 
comparaUlity' they showed not only how well 
f^artjcular students performed in comparison with 
their classmate^ but also how v^qW they did in 
companson with pupils m other classes, other 
schools, and other distncts. 

We had. then, the basis for a fine combi^^atiu^ of 
techniques, standardiifed tests, whfch couJd moasufo 
sheer accomplishment jn several areas very well, and 
teacher judgement, which could add dimensions 
maccessjble to standardized testing but pertinent to 
the mterpretafion of pupil scores. 

We have learned a hard lesson in the ensuing 
years how difficult it it to keep achievement scores m 
that perspective, to see them as a valuabie ingredient 
m the mix of mformation that mdicates how much an 
individual student or a group of students has learned 
in a particular area. Scores are not. by themselves, 
sufficient. That fact needs to t>e reiterated. But they 
do remain a key component in any adequate learning 
information system. 

Three Fallacies 

Undoubtedly, testing has suffered more from the 
excessive expectations of its most devote^ ^vocates 
than from the attacks of its critics 1 have men. jned 
three virtues of tests that have enhanced their 
usefulness. Let me balance the account by 



mentu**n?fHi Jhrer trii'i-K"?f'!> A^' i fort) k^^^ ovt*i 
e^^^tn.J^fiJ^M^ ^tavo '.ni^uits:! thf nv»b:uf>e of 

Tfu' first I s^udi .-..ji! lln^ Mic^vmctcr ra/>wn;y Some 
people liavi* jMvf^t^trU h-s! s<:ere-s witt^ a (.ueribfon « 
i^n tofa!lfbilitv ■ IU<\\ LMey nvMvi pvV: »^essed ^ he 
eni-4iHtiu^^*^< ovejy trial they are mA perfectly uc( iirate 
h.is i>tnvLuK:d the far.1 lha! Ihey are more accurate 
than nH)^\ of IfieaiTerria^ives. 

' he second mo^ ! *iha!! call thf^ Whole Pvrson 
FiiHiicv the* tfrndoncy to u?ad into aetiievement test 
scort>s nujcfi more than they really toll, which is 
^impiy thf.' anipunt a e^ludent fias learned in a given 
subject. Su!7^^:^ peupk^ are let down upon discovering 
that achievement tests fad to measure a variety of 
traits l<kp honesty ot leadership or socfal 
conseiou^:rie;V%. a.id in iho^r disappomtn^ont over tne 
fact thai the tesia do not descnt>e thi^ whole person. 
fOfQet that Ihey do a /ather good job of rneasuring the 
ai^adeniir: accom{ihshrn(»rit.K fhey purport to measure. 



- . testing has suffered . . . from the excessive 
expectations of its most devoted advocates . . / 



Third there the Equal Preparation Fallacy. Some 
people expect the test to compensate somehow for 
the differences m academic development of children 
whoso learning opportunities have differed 
dramaticalty. The test score tells you nothmg about 
the difficulties a student has had to overcome to 
acquire a gjven level of proficiency, but it tells you a 
great deal about what that level is — a fact that is 
central to deciding what the student is ready to tackle 
next. 



Comparable Results 

The sanie three virtues of tost scores - accuracy, 
objectivity, and comparabUity ~ that had appealed to 
teachers interested in !he accompUshments of 
individual students also commended themseivcB to 
administrators. For the Hdmmfstrators the greatest of 
the virtues was (and fs) comparabiHty. 

Thay saw at once the power of standardized tests 
to permit compansonB of the ^earning achieved by 
pupils of different teachers, or in different t^chools or 
indifferent districts Since it was well known that 
some schools were rnuch more demanding than 
others in their cur riculum and in their pass marks, 
here at last was a chance io put ali students on the 
sama footing in an objective compa. ison of results 
across schools- - to make sure the children were 
learning as much as they should, as determined by 
how well the children elsewhere were doing. 

In the main, these aspirations were sound And 
when the three fafiacjes have been resisted wh^n 
people have not expected the precision of the 
micrometer, have not looked to achievement tests to 
measure the whole person, and have not assumed 
that by standardizing the test you have standardized 
preparation — then, indeed, the scores have given a 
new dimension, through their comparability across 
geographic areas and spans of time, to the 
Information available to educators. !t is hard to 
imagine any other basis we might have had for 
learning, for examp that the verbal and 
mathematical skills i : students applying tc go to 
American universities have declined in the last 1 5 
years. It may not have been a comforting message, 
but It was one worth getting. 

Let's look at a third major use of standardized 
achievement tec^ scores in America — in the 
iielectjon of studeits, especially for unlvensity. This 
use got an impn?cant b(x>sX during World War H, when 



tracher power wns unavai^atMpto unuIp the cntmnt-t' 
essay tests, Ttie boards of t^ntfv sijiibtitutno ^^bjeii tv^^ 
tests as a wnr measure becausf^ thoy c ouk.f ho o?aded 
so effici^ntiy, fiMly exi^-i tmij to fotinn to the t^s^b^r^y^-- 
after the war ButJothb surpf^set^t ?uanv, not only 
worp ttie objective tet^tseai^ferto gradt-r the >coit^s vn 
them we?o at least as effective th,^^ ess^^y test sroH}s 
had boen in measuring the attainments of the 
students who took them. This conclusion vva^ arrived 
at first throiKjh the yenerai observatiofi of thi^ school 
and university people mvoiveri and thi>n t'onf^rnjeci 
thiouyh careful research The f^cores wf^re hy una 
Kirt^e accurate-^ - - mor^ so than scores on the essay 
feasts had been 



The test score tells you nothing about the 
difffculUes a student has had to overcome . . / 



Students wno were admjttf?d proved to have the 
accomplishments the test scores had promised The 
scores provided a common currency. unHke the 
grade-point averages that reflected difference in 
grading between schools and between parts of the 
country Moreover, they were much less suhjert to 
special coaching than the essay tests had t>een; they 
reflected more fairly the accomplishments of 
students from different parts of the country different 
schools, different curricula. 

Unparalleled Growth 

As a result, the selective institutions that used the test 
scores as one basis for admissions decisions were 
able *o seek out talented students from every comer 



{.* t^ie rn\hon iUvA awry t^oc la! i^tralunr Tfjcry boqim to 
scv on (^ampuK d more tietcrocjeneous group of 
students both soci.illy and geographically. Moreover. 
w^tr1 objective tests of aptitude and achievement in 
pUwv and efficient, it wns possible to test the 
rr^.of nioub postwar wave of college applicants under 
file new system, and an unparaiif led expansion of 
hioiher educ^ tion took piace in this country. 

The system worked because of the same virtues of 
standardf/eri te^ts: accuracy, objectivity, and 
comparability. Again, however, some people were 
tempted into entertammg great expectations that 
t ouid not be and have not been fulfilled. The scores 
wcfo accurate in the main tyut not micrometers in 
their precision, even though their errors were smaller 
than those of other kriown techniques. They 
measured achievement and readiness, but not tracts 
of character and temperament ™ not the whole 
person They made no allowance for the 
fHRdequactes of preparation or special circumstances 
of environment that the student had overcome in 
school. In short, the assumption of equal preparation 
remained — and remains-- a fallacy 

Guidance for Students 

We have discussed three legitimate and important 
uses of standardized tests: their use by teachei^ to 
determine how much the individual student has 
learned, by administrators to determine how much 
classes and larger groups have learned, and by 
university admi^ions people to discover how well 
prepared a prospective student may be Each use has 
made great contributions to education and society 
when kept in perspective and used with other 
pertinent Information. 

Another important use of test scores is by students 
~ to help them as ihey examine their educational and 
career goals and estimate tn^^ir readiness to 



undertake a more or less dtmiandtng pr ogramm^^ of 
study as their next stop This use h;^s recpived ieeks 
emphasis hisloricaHy then it should have, but the 
evidence from a set of tests can ariri materially to the 
information avnil^jhie for a student s guidance. 



. , there are many gifted youngsters who 
rriake rrediocre records in school but surface 
through their test scores. . / 



Peopk? who now discover that test i;corrs cari vaiv 
from day to day and from test to test are on the right 
track. Every examination, every judgerrient about 
people, IS fallible and has a typical error rate. The 
standard error of measurement associated with 
standardUed lest scores is well known because ft is 
readily determined and regularly announced by the 
publishers. That does not mean that other forms of 
measurement, such as essays or interviews or letters 
of reference or teachers' grades, have no error or a 
smaller error. In fact, although such errors are rarely 
reported or even determined, research indicates that, 
in the typical case they are much greater than the 
standard error of measurement of a test score. 

Those who call for a high degree of teacher 
fnvolvenrrent in assessing students in their classes are 
right, as are tho^ who decry tfre use of test scores 
alone as a basis for evaluating the effectiveness of an 
entire educational programmn. There is much more 
to be considered than is reflected in the score, 
including the conditions under whjch the results we/e 
achieved. But to say what some recent critics have 

J8 



said that even if otw w^nts to kfiow hi)w piiptls Hfv 
doinij in anfhniotic one should bn forbidden to ime a 
standardized tost in anthmettc - is to swrng 
pendulum riyfit out of thi? clock 

institutions that use tt^st scofos lo ^eloci ni.'W 
students should indt^ed use Dthf?r information as wcMf 
AchieviMiient tests simply cannot measure, the whole 
pt^rson Obviously, a colicoo should consider \Up 
record of ptovious school oerforrriance and the 
judgements of teachers nnd counsoHors. Yet there 
afc many gifted younc)sti^rs who make mediocre 
rt-^cords in ;ichoQ! but Siirfact^ through thf;ir t^^st 
scores, iis Juhan Stanley s work aiJofins Hopkins 
L^niversity f^as amply den^onsifated. 

Forget the Magic 

Finally, those who would ascribe to tesfs cjivtni at 
school age some magic th/it enables them to divmo 
genetic intelligence or ability to learn should forget it 
Achievement tests measure developed abilttv - 
developed in relation to a particular subject or 
disciplme. To a large extent, so do scholastic aptitiide 
tests, although the areas of experience througfi 
which a student is prepared for them are much more 
long term and mort? pervasive in our society. But 
equal opportuntty in education simply has not been 
realized. To think that, at 18 years of age, people 
whose experiences have t>een vastly different can 
show their inborn potential through a test of verbal 
and mathematical reasoning is naive, regardless of 
their cultur*al advantages or disadvantages- 
Standardized tests of achievement have amply 
demonstrated their utility over the past two or th<ee 
generations. Because of their accuracy, their 
objectivity, and their comparability, they deserve 
recognition as a powerful tool in education. They 
have suffenBd in esteem first at the hands of those 



who riainuni tor them a set of qualities they could 
never attain, and latterly from the protests of those 
who havo proposed that since they are imperfect they 
Ixf tiDHH iway with 1 suggest we put them in a 
Tpasonabie perspective as we strive to in^prove both 
the lest3 and thon use 



Note 

[hih ihr n>7{^ prt s<di>n1.iil ;Kidrf^?s of W W. Tiirnhull to 
L iiiu'iUtunni Tr-HJif)L^ Se/vicu {TTS). $:,iightly adiivd and 
prjntt'd by pi»rm(SSjon 

f- 7S ).s a private, non- profit, organisation devoted lo 
fnpa*^ijriMTi^^nt and research )1 was founded m 1947 by the 
Amrncan Council on rriucatun^ tho Carnpgie Foundation 
for tht^ AdvcHir^ment of Toachffit?. and the CoHege? 
EntMnrp Examination Board It has its main office m 
Prnu'eton Now Jersey, and js, m fact, itjvi targes! 
oM^ijnib.nion dfvot<>d to educational testing m the world. 

X Copyright IfM ihib item rfniaint^ with ETS 



10 



The Foundations jlii^ 
of School Testing 




The Fmindations of Sdhooi Testing 



For teachers specialising in 
i^sessment techniques, and 
students anxious to sort out the 
theory and practice of testing in 
schools. 

By Cedric Croft 
NICER 

An understanding of validity, reliability and usability are 
a must for all test users. The validity of a test is an 
indic^on of how well it measures what the author 
ciakns it will measure; its reiiabiiity describes the 
consistency or dependability of its scores; and its 
usabiHty is concerned with its administration, format, 
interpretatton and supply. 

An alarm dock that keeps accui^e time can be 
descrjt)^ as t^ing re/ZaWe, and if the alarm goes off at 
the right hour the clock is functioning validly. If the dial 
of the clock can be read with littke chants of 
mfsinlerpretation, the alarm control of»rated readily, 
and it is robust and easily rewound, the clock could be 
ch^ribed as teing highly wsaWe. If it stops, however, 
and f fail to reset the hands on winding it up, it is still a 
retlabie dock in that it continues to keep consistent 
time, but the alarm will not function validly since the 
time shown on the clock does not conform to stanoard 
time. The dock could still be usable, but the ease with 
which It can be used has been affected by the lack of 
validity. 

Tests also c^n be highly reliable but not valid for a 
particular puipose. A diagnostic test of long division for 
example, can give very reliable results, but it would not 
be the most valid test to select pupils for an enrichment 
programme in all branches of mathematics. Test validity 
Is heavily influenced by reliability — the aJarm will go at 
the wrong time if the dock runs slow — but high 
degrees of vaNdity and reliability alone do not 
necessarily guarantee usability — think of an accurate 
but faceless alarm clock. The usability of a test will also 
suffer If nsliability or validity are impaired, for example, if 
the group the test is to be used on differs substantially 
from the group the test was developed for. 

1. TMt VaHifities 

When we have given a test, and have the scores, what 
m«!f we Infer from those test scores? What have we 
measured? What can the scores tell us? What may we 
nrt Infer? These are questions atiout the test s validity. 
Note that validity is inferred, not measured directly. 
Evidence of validity is usually presented in a test 
manual, but validity cannot be regarded as a universal 
and e^wfl^ting feature of a test: it is a quality we must 
judge; and it may be adequate, marginal, or 
unsrtisfactory for this group, at this time, for this 
specific purpose. 



Up to ton types of validity cm be identifi^i but the 
following four aro most rclcant to classroom testing; 
content validity, concurrent validity, predictive validity 
and construct validity. 

(i) Content Validity 

You have content validity when the test measures a 
representative sample of the relevant knowledge skills 
and t)ehaviour. If fractions are emphasised in your 
teaching, does the test emphasise them too? If urban 
drift is not covered in your teaching will its inclusion in 
an achievement test be legitimate? If drawing inferences 
was your main thrust in science, how many of the test 
questions could be answered by recall? Evidence of 
content validity is crucial when we wish to genemlise 
from an individual's performance on a test to the 
knowledge and skills the test sampled. 

Content validity is of the utmost importance for all 
classroom t^ts. and also relevant to t>ehaviour 
checklists, measures of scholastic aptitude, tests of 
special abiliti^, and personality inventories. 

For a test to have content validity the test items must 
measure the behaviour they purport to sample. 
Descriptions of the course or subject matter, the test 
objectives, and the natune of the sampling, are critical. 
Although some objective procedures can be used to 
help asse^ content validity, the final judgment must 
remain the opinion of the test's user and the process will 
always be largely subjective. 

It is always po^ible that a test regarded as having 
content validity for one school, could be invalid for 
another school. This would be the case when these 
schools had differing objectives, or had chosen different 
content or had different emphases, A lest of reading 
achievement containing a section on skim reading 
would be valid for a school that taught the techniques of 
skimming, for example, but invalid in a school that did 
not have the development of skim reading skills as one 
of its objectiws. For an achievement test, content 
validity will exist when there is close agreement between 
the school's objectives and teaching practices, and the 
test's coverage. The focus of content validity is firmly on 
the actequacy of the sampling of course-content, and 
not just on the appearance of the test. Although a test 
should look as though it will measure what is claimed, 
this face validity' is not sufficient. Establishing content 
validity must be a prime consideration for everyone 
constructing an educational test or examination. But 
how can it be done? As the first step, draw up a table of 
specifications showing the weighting and emphasis that 
will be given to the various aspects of subject-matter 
and cognitive process. For an example, see table 1. 

The essential question to ask is: To what extent does 
the content of this test reflect the knowledge and skills I 
have tried to develop in these pupils?' 

(ii) Concurrent Validity 

Concurrent validity is an estimate of the relationship that 
exists between scores on a test, and some other 
acceptable criterion. For example, the performance of 
children on New Zealand's PAT: Reading 
Comprehension might t» compared with their 
}»rformance on the Australian ACER Paragraph 



Table 1 3b biology 

TERM 3 Ms BELLMAN 



Coum« Content 



Co9fi{tfv« Process 

Know- Compre' A^kMi- AnBtyiihi Total 



1. Methods of Scinnoo 
Testing 

Hypotheses 4 2 2 2 10 



2. Animal 

Dassiflcaiton a a 



3. Ttie Piants of 
the Earth 



4 



10 



4. Populations and 
Mechanics of 

Evolution y 3 

5. Evolution. 
Genetics and the 

Races of Man 3 



2 3 10 



3 10 



Totfl} Items 



12 If. 14 



8 {>0 



Reading Test. A high correlation (0.85 ♦ ) would suggest 
that these tests are measuring substantially the same 
skills, so consequently, their concurrent validity is high. 
Note that this does not shed any light on the nature of 
the skills being tested. Furthermore, all a low correlation 
teMs us, is that the skills and abilities being sampled by 
each test differ. 

If performance on a test of library skills, for example, 
was corfBlated with ability to use a library", it might be 
po^ible to make a statement about the way in which the 
ti^ performance nelates to real life'. However, real life' 
is a difficult thing to measure objectively, and a low 
co-efficient may just mean that the test is not t>eing 
compared with a suitable criterion. 

E^ntially. concurrent validity provides confirming 
e^dence of a test's validity — validity by association — 
the test in question must be valid if it relates well to 
another that is already regarded as valid. There may be a 
tendency to overvalue the importance of concurrent 
validity data because it is numerical, but in reality, 
concurrent validity provides us with the least 
Information about what a test is actually measuring. 

Oil) Predictive Validity 

Predictive validity is a measure of the relationship 
between test scores and some appropriate performance 
^ a later date. In New Zealand it would be possible to 
investigate the relationship between performance on 
PAT: Reading Vocabulary at Form I, and the marks 
gained in School Certificate English four years later. 

Predictive validity is most crucial for selection and 
training where a test is being used to forecast likely 
success in a training programme. In the classroom 
(^ntext tests of reading readiness or learning disability 
are examples of tests that must have their predictive 
validity established. 

By and large, teachers need not be concerned about 



ERIC 



2.^ 3 



the predictive qualities of standardized achiever^enl 
tests usci in classroomcstnce their main conctun is 
with here~and-now performance. 

{fv} Construct Validity 

Construct validity concerns psychological traits or 
qualities and attempts to describe the underlying 
psychological processes that are used in a specific test 
situation, A psychologist's constructs' are similar in 
nature to a physicist s models': both are theoretical 
notions that are developed to help explain and organise 
aspects of existing knowledge. Terms such as reading 
readiness', anxiety', scholastic aptitude, critical 
thinking' and reading comprehension' are examples of 
constructs. The basic question in construct validity is 
not. Does this text mea&ure what the author claims it 
measuies?" but, What exactly does this test measure?' 
The identification of all factors influencing the test score 
is the aim of construct validation. 

Ctesplte the crucial importance of this quality, least 
progress has been made in establishing sound evidence 
for the construct validity of most psychological and 
educational tests. There is no satisfactory single 
technique for assessing construct validity, nor can it be 
firmly established by any one study. The methods used 
to obtain evidence of construct validity include (i) logics 
analysis of the mental processes used to answer test 
items (ii) studies of group differences (iii) studies of 
changes in performance overtime, particularly when 
treatments differ (iv) correlations with other tests (v) 
intercorrelation of items within the test. 

A Final Word on Validity 

It is worth stressing at this point that all types of validity 
are inter-dependent: they provide information on how 
well the t^t measures a defined field (content), how it 
compares with other valid measures of a similar type 
(concurrent), and how well it predicts future 
performance (predictive); and all this information may 
be used when considering the test's construct validity. It 
is also worth reiterating that a test is not valid or invalid 
per se: it depends on the use to which it is put. 

2. ReliabiHties 

To be valid a test must be reliable. In fact, test reliability 
has a ceiling effect on test validity: unless a test 
measures with some consistency it is not possible to be 
sure what the test is measuring. The reliability of a t^ is 
most often expressed as the correlation betw^n one set 
of scores (on a test for a specified group; and another 
set of scores (on an equivalent test for the same group). 
This correlation, usually called the reliability coe^iclent. 
ranges from 0 to 1. which corresponds to a scale from 
complete unreliability, i.e.. a random fluctuation of 
scores, to complete reliability, i.e.. perfect consistency 
of scores. 

Although reliability coefficients of 0.^ or higher are 
reported occasionally, test constructors are satisfied if 
they can achieve reliability in the vicinity of O.M. It is 
tempting to interpret reliability coefficients as the 
percentage of scones that are in complete agreement, 
but this is not correct. However, we can use percentages 



toiiiustmte vhe relationship t^tween reliphtltty levels 
and fluctuations in test score. Suppose we divide a class 
we have tested into two halves on the basis of theif 
scores; then we re-test. How many childmn will remism 
In the same half following the m*test? tf the test has a 
reliabiHty coefficient of 1.00 all of them, ^'X)%, wjII still 
be In the same half: if the reHBbihty coefficient is 0.96 
then 95% will stay in the same half and 5% will have 
moved from one half to the other; and so on. 

Tabte 2 interpreting Reliability Coeffici«>nts 
in Terms of Percent of Agreemeni 

Correlation CiX>fficjcnf Porcpnt of Ayn^pmenl hy Haivs'i. 



1 00 


100 


0.96 


95 


0.90 


90 


0 85 


67 


0.81 


8£i 


076 


83 


0 64 


80 


0 49 


74 


0;?5 


66 


0.00 





from Rob<?rt L E^cl. LsscnUals of K(iui.:!:itionfti Mcusurrnivnt. 
N.Y, Pffnf ire Htifl. 1^7? 



The reliability coefficient gives an indication of whether 
the test is highly consistent, fairly consistent, or vnry 
inconsistent only. However, it is used to determine the 
'standard error of measurement . of which more later. 

The four major types of reliability that are most 
relevant to school achiwvement and aptitude testing are 
calculated by test-retes«. parallel forms, spiit-haif and 
Kuder-Richardson methods 

0) Test'Retest 

Test-retest reliability is estimated after a test has been 
given to a group on two separate occasions. The set of 
scores obtained for each individual on the first 
administration of the test is correlated with the set of 
scores obtained on the second administration. This 
gives us a test-retest reliability cocfficlenf. 

What pupils do between the two tests can be crucial, 
tf, for example, they learn things related to the test there 
can be marked changes in the scores, As a result the 
test-retest reliability coefficient may be artificially 
deflated In addition, doing the test again may not seem 
a very useful activity from the students' point of view, so 
the second test may be a much poorer measure than the 
first. For reasons such as this, studies of test-retest 
reliability must be carefully controlled if a valid estimate 
of the test's sfat)///fy overtime is to be gained. 

(if) Parallel Forms 

In some tests, for example. TOSCA in New Zealand and 
OTIS Higher in Australia, there are parallel forms of the 
t«rt. which make it po^ble to measure the same skills 
on different test material. Usually a single group does 
both few ms of the test, on consecutive days. Parallel 
forms reliability is really a measure of the equivalence of 
two forms of a test. 



The practical difficulties associated with test-retest and 
parallel-forms stimulated the development of 
attematives One of these was to split a test into two 
reasonably equivalent halves, usually on the basis of 
odd and even I'ems. so that each subject has a score on 
the odd items, and another on the even items. The 
correlation tmtween the scores on the odd and 
even-numbered items is then calcuated. The split-half 
technique results in a coefficient of internal consistency 
that IS. in essence, a measure of the homogeneity of the 
sKiHs that are t^ing tested, 

(iv) Huder-Ricfiardson 
Ktider and Richardson developed alternative 
approaches. Their lormula. KR20, which has become 
widely accepted as a basis for estimating test reliability, 
requires Information on the difficulty (proportion of 
correct responses) of each item in the test, and on the 
spread of scores. As the calculation of item difficulties 
can be a time consuming process, an estimate of a test's 
reliability can be obtained from Kuder-Richardson 
formula 21 . which is based on the numt?er of items in the 
test, the mean score and the standard deviation of 
scores. This short cut' approach always gives an 
under-estimate of the reliability coefficient when the 
items vary in difficulty, as they nearly always do. 

Reliability and Errors of Measurement 
A very practical way of thinking about reliability is to 
consider the sxtent to which an ir»dividuars score may 
vary from time to time. Every test score is made up of 
two pprts a true score', and an error score' Error 
score's — which can raise or lower an individual's true 
score — can come from the test itself, from 
characteristics of the individual or from features of the 
lest administration. 

If changes in test scores are not large t>etween 
successive testings, the effects of these error variables' 
have been mini.mal. and the test is reliable. If a child's 
score changed from something around the 67th 
percentile to something around the 33rd percentile for 
example, over a period of a month or so, and there 
appeared to be no good reason for the change, the 
reliability of the assessment would be very much in 
doubt. 

The extent to which an individual's score is likely to 
differ from the true' score, can be calculated and 
eixpressed as a standard error of measurement'. The 
standard error of measurement gives us an Indication of 
the absolute accuracy of the test scores and generally 
speaking the smaller the standard error of 
measurement, the more reliable the test is. The Ravens 
Standard Progressive Matrices is said to have a 
standard error of measurement of 3, This suggests that 
for about 68 pert»nt of cases the errors of measurement 
will be 3 points or less, but for the remaining 32 percent 
they will be greater than 3. 

The standard error of measurement lets us interpret 
test scores as a band, rather than a single score. In 68 
percent of caees an individual's true score will tie + or ~ 
one standard error of the raw score. If you score 30 on 
Ravens, there are 68 chances in 10) that your true score 



ties between 33 and 27, If the bund of scores is 
broddened to encompass two sfandard errors, i t*. 
24-36, there am 96 chances m 1CK3 that the true blow 
falls within this range Although H may look as though 
Kjmepracfsjon has been tost, the use of bands of scoro.^ 
increase the chanceis of reliable rnc:>a6urf^ment 

a UsaMiity 

Atest must be suitable for the purposes required, and 
there are practical questions that must be asked 

(0 /s the test readily avsifablc?Ho mattct how valid 
and reliable a test might be in a particular situation, it 
will bo of little use if you cannot get it. Even good tests 
date after a time, and they do go out of pnnt^ Some tests 
remain research instruments, and despite sound 
characteristics may never be published for widespread 
use^ Tests that are publi^ed overseas can take up to 20 
v^ks to arrive In Australia and New Zealand, so supply 
can be an important factor unless long-term pfanninci ^s 
carried out dihgently. 

(ii) Am / 0ble to administer the test? 
Tests vary in their complexity, and hence there are a 
wide range of practical and theopBtJcal tmnmg 
requirements. There are two questions 1 Do I have the 
background skills necessary for the competent 
administration of this test? 2. Am 1 allowed to 
administer this test? In New Zealand Ni:CER 
administers a test user qualification scheme. In Australia 
ACER administers a stmilar scheme. Certain classes of 
tests are available or liy to users who possess recocinisod 
minimum qualifications. The test catalogues available 
from NZCER and ACER Usi the qualifications and the 
restricted tests. 

(Hi) Can I interpret the ras^uHs? 
Test results can be reported in a variety of yvays age 
percentile ranks, class percentile ranks, deciles, 
stanines. z-scores. T-scores, deviation, IQs. and so on 
You need to be famHiar with the properties of the 
transformed scores to mtBrpret the score It is also 
necessary to know what behaviour js bemg sampled, so 
that how the test items reflect the abilities or aptitudes 
being tested can be seen. Good test manuals help. 

(iv) How much time will ft take? 
First, time will be spent on giving the test, Then time 
must be spent on marking and interpretation. And have 
you got time to do something about what the test mav 
reveal? Is the ultimate usefulness of the test scores 
worth the time it will take to get them? For example, the 
Donen Diagnostic Reading Test of Word Recognnion 
SkiUs is made up of 11 subtests each with at least two 
sections, and takes three hours to administer, With time 
for scoring and interpretation to be added, there would 
need to be very real advantages for the teacher and 
pupil to justify the time mvolved 

M What will the total cost be? 
This also must be weighed against ihe ultimate 
usefulness of the information gained Take the probable 
effective life of the programme into account, as the 



^ettmq up conts arr rn^lativfiy hiqh imt ro.s»t.^ di'?rfcase 
with subs^^qurnt usi^ K'^rause most ot the c iunponerits 
are fe-us^able 

iVf) From what group were the rn^rms denvcd - 
Much of th^' value m n f>tandiirdised tef^t conH^s from 
being able to compare nn inrijvidual s srom with those 
of a representative sampit* of peer^, To be of most vcilue, 
the norms on a test should be derived from the same 
population as those takincj the tost If this is not the case 
t\ judgment must be marie whether vaiui compansofis 
can be made between those takinc? the test, and ttie 
norms sample. Little purpose would be serv'.Ki m 
administering a test of mechanical comprehensjun to 
third form technical students if the norms for the test 
had been denved from the performance of university 
engineehng students. 

One result is that teachers and i^tlu^r test users 
are forced to use unadapted overseas norms when 
interpreting the performance of New Zenlanders and 
Australians. This is a far from ideal situalion, which gives 
nse to test information that is neither valid nor reliable. 
The widely used Burt {Rearrange ^ Word Reading Test. 
and its successor the Burt Word RL'admg Test (1974 
Rpvision} can be used to ilUastrate this point. The Surf 
(RimtBnged) Worti Reading Tpit was normed on a 
sample of Scottish chiidreri in 1955, the arrangement of 
words in the test and the procedures for computing the 
so-called reading age' being adjusted to reflect the 
word reading skills of Scottish children. This test h'^s 
been used in its ongina! form m New Zealand, on the 
assumption that There st? no difference betvveen the 
performance of New Zealand and Scoitish children on 
the test' 

A levision of the l9ftS v* -iOn of this test wns 
published m 1974 TheScotttsh ligutes show that 
children now htive to read more words correctly to get 
the same reading age . For example, on the I96i> 
fornuifa 30 words rtv^d correctly gave a readini? age uf 
8 0 years but in the 1974 revision 30 words r^^ad 
correctly result in a reading age o' 67 years. To be 
credited with a reading age of 8,0 years. 48 word*, now 
need to be read correctly. The Bur' Word ReadifU} Tost 
(1974) Revision \s now bemg wideiv used and the 
assumption is that these new Scottish norms validly 
represent the performance of New Zealand children on 
this test, which secerns unlikely 

The New Zealand test user <s not well r»upp)uid with 
tests which present current New Zpalan^1 norms Apart 
from the Progres5/ve Achievement Tas^s, locally normed 
tests which have direct application in schools are 
restncted to theOf/s Tesfs of Mental Abflity. theAC£R 
Silant Reading Tests, ihi^ACER Arithmetic Tests and the 
Oral Word Readmq Test. The last three named tests 
were all standardized in 1964, so the normative data is 
now somewhat limited Duem 1981 is the New Zealand 
developed and normed Test of Scholastic Abihties 
(TOSCA) which should replace the OTlS Due also m 
1981 are two other normed tests, the Proof Readtnn "^f^'-t 
of Spelling ano a New Zealand 5>tanOnrdi?atinnnf the 
Burt Woro Reading Test 

On the surface at least. te^;t users m Australia are 



better supplied. They have more iirhievement tests \b g. 
4C£H Prmiary Hoading Sufvvy Te^fs. ACER 
MathmWics ProWe S^rm) mow y^nemi at>Hit>Me^ts 
(e.q., AC£R Lowtf Gmie^ At)Wty Siiate, ACER foists of 
I eBmmg Atvlity-) and more special purpose tests ^c\g., 
\CER Checkf!!P!l for School BiHjmnvr.^. ACtR Shvrthond 
/^titiid^ Tps!) Howfjifer. the nnd flivrrse 
populatiof t coupled with tiiffonnti »^tatp i^durationp! 
systems, may mc^n that the oumbor uf testf> suitabU-^ iot 
a specific us*? i,s Btrictly fimit^^d Tt\sts aro usually 
developed for particuinr pi^puiaiions and edutaliLm 
syi?ti.*ms so, the Austf;i!iai j us^n mur^i bcvK^Kint in 
making c^uro that tr^e le^;t us suitabU> for hi^ or het 
purpos4'S. 

Conclusion 

By considennp a tpst ,s v,i!iiiit\\ fk'iKitviJtv and usability it 
will be possibii-' to dpcirie wholhpr the tp.sl will perform 
th€ function you have m mind If an evaluation of the test 
induiates tha^ it doe^^ mit meet yout :equirpnier^ts. itwe. 
effort and nsont^y have l^con well saved, U will aKsc5 
ensure thai tests are used as tht> seivanl of tearhers a fid 
pupits, and do not bocomo thou niartiM^i 

Eveiuation Checklist 

The charart^rustH'!. ot ivrA:^ tiiat ^'^■nr t^rrr? oiJiSfniHl m !^^^-> 



ar1ick> can be used as a basis for evaluating the potential 
worth of any test. By systematically recording 
infofrnat.on about those major characteristics, inform- 
ation that can be us€d m de^cision-making can be 
qiiirkiy summarized, Item 9 ofthissef is such a list. It Is 
not copyright and you may diiplicate it a5 you v^ish. 



Suggestions fr: Further Reading 

Bauornfrtnd. R H. 
Building a School Tvsiing Programme. Boston. 
Houghton Mifflin, 19B3, good 'introductory text.) 

Eboi R.L 

Esst-'fihals of EduCiWonal Moasurement Englewood 
Cliffs. fsJew Jersey. Prentice-Hall Inc.. 1972. 

Gronlund. N.C. 
Construchng Achievement Tests. Englewood Oiffs, 
New Jersey.* Prenttce-HaH Inr . 1977. 

Lyman, HM. 

Tost Scorr^s and What Thoy Mean, Englewood Cliffs, 
New Jersey. Prentice Hall Inc., 2nd ed,* 1 971 . (A good 
introductory text.) 

Thnrndike • R L and Hagen, EHzaboth 
Mriisurvnwnt jnd Evaluation in Psychology and 
t'iiacr^tfon New York. Wiley. 1969. 



ERIC 



Tastfivolualion 8h««t 



WTO fl f y hm TaslTttfe ^.....^^^ ^ Auth^s) 

Publisher ^ Publication date 









Afitoto 
Vallility 


Functions outlined 
Quide fof interpretetion 
Guide for use of results 

Type<s) reportad and values shown 

Criteria described 

Face — item arrangement 

— page layout 

— quality of illustrations 


j Content 


Reilabiilty 


Type($) reported and values shown 

Method used 
Sample(s) descrit^d 


[ Split-haif Parallel forms ^ « 

Standard Error of Measurement , 


Ueablllty 


Administration 

— Special training nece^ry 
~ rinrre - power^peed 

— suitability of iretructions 

— practice exercise j 

— overtapping /discrete i 
Scoring | 

— type of key j 

— ease of conversion j 

— treatiront of errors j 

Examinee appropriateness j 

— instructions 1 

— iten^ 1 

— mode of response ! 

Norms j 

— types of denved scores 1 

— age range cov^ed 5 

— population described | 

— sampling/number of cases \ 


PRs{age) PRsiclass) CJeciles^^^.^ ..^ 

Staomes Oth^ ^ 


Economy 


Cost of each component J 

What is re-usable? } 
Time for test marking } 
Time for interpretation ! 


Manuals BooktetS Answer sheet $ 

MarkmyKey{s)$ Other S 

56 



Test Evatuaton Stieet 



fom ^nmi by Cediie CteH NZCER 



Publisher Puto«cat»on data 







Comm0nt9 


Afiisfo 

iflHl^ 
pretatioii 

VaUdtty 


Functions outlined 
Gutcte for tnterpretatlon 
Quide for use of results 

Typeis> reported and values shown 

Cnteria described 

Face — Item arrangement 

— page layout 

— quality of liiustfations 


5 Content Ad^i*^/ 0*^(^9^^ fO if^f Sjj?^^^ 

1 Concurrent il'Sr^Jif rAl!^^!^ Pnsdi ^J^JctlXfifftfi^i.^^ 

\ Construct ^ JC^fsd^. . 

1 Cf&fr J^rf 'i^^ %ji^(t Jaircyi c fit terra » 
\ A/' A 


ReliBbiilfy 


Type{s) fQporXe^ B\^ti values shown 

Method used 
Samplers) described 


\ Split-half ^K^LC^p^ti^A^^^ Paralleiforms ..^-.^S^^^*^ 

\kR2\C&1^:lirjS0j.(^A^ Test-retest ^^ML<?^fS^A'^ 

: Standard Error of Measurement ^^0Cr^Jz^JZ^:^J^&£d^3'J§i^^^ 


U&aUllty 


Administration 

— Special training necessary 

— time power/speed 

— suitability of instructions 

— practice exercises 

— overlapping discrete 

Scoring 

— type of Key 

— ease of conversion ; 

— treatment of errors 

Exam J nee appropriateness 

— instructions 1 

— items 1 

— mode of response 1 

Norms | 

— types of derived scores ] 

— age range covered ; 

— population descnPed j 

— sampling* number of cases | 


Stanines Other SfL^^^^^ 5]?^ ^^^i^ 


Economy 


Cost of each componi^nt J 

What IS re-usable'^ 1 
Time for test marlcing | 
Time for interpretation g * ^ * 


Manuals Ronkfpt^ A/>^* Ancu^or eHA*>f « Z-^??/**^ 

Marking Key(s) S Other $ Q J^Q. ^^^^l ^^^^ ^^fJf^ 



jl.lll.l.llltl.l.lll.l . l. l .l . l.l.l.l . l.l.l.l,l.l,l,l.l. l .l.l 



Assessing What 
They've Learned 



By Warwick B. Ell^, University of the Soutli Pacific 




Sue Price 



1. Introduction 



Why do we test pupils? 

Most te^cher^ spend a lot of lime and effort preparing 
testSt ^ving tests, marking Xtsx^ and using tests for one 
purpose or imother. Why? The main functiimji of tests and 
exdminiitk>ns in school are: 

(a) Mastery. Clasnroom tests are often prei^red by 
teachen^ to see whether pupils have mastered a particular 
unit or skill that has just been taught. 

e.g.. Teacher |3ves a quick quiz to sec whether pupils have 
learned how to muhipiy fractions; or know the main events 
and characters in the book Animal Farm": or have 
teamed the main vitamins and the foods that contain them. 

(b) IHagnosis. Tests are often used by teachers to 
determine the m^r weaknesses that a child show s in 
basic ^ills. 

Crg., In mathematics, to see whether pupils know basic 
number facts, or can handle zeros, or understand how to 
divide fractions; m reading* to see whether pupils* 
difficulty is due to poor vision or hearing, or word attack 
skills, or vocabulary weakness, or some misconceptions 
about print. 

(c) Reponing Progress, Formal examinations are oflen 
used to report on tfie progress made by pupils over a term, 
cff school year. The results are of interest to parents, 
I^pils, potential employers, and teachers in next year s 
clf^ses. 

In ackl'tion to these three main uses, foimal tests and 
examinfitf ions are often used 

(f ) to place chikiren in ability groups; 

(ii) to select pujrils for furtter education , 
sctolarshipSt etc.; 

f H}) to match pupil materials with pu]»l abiyties; 



(iv) to evaluate one s own instruction; 

(v) to assist with vcHratiunaJ guidance; 

{ A) to determine a child's readiness for learning; 

(vii) to undertake research on pupils* abilities, or on 
teaching methods; 

(viii) to determine whether educationul standards in a 
schoi^K or total system are changing- 

As teachers, we should be clear why we are testing, 
should not test just l^au.He it is always done. Certain^ , 
tests often do help motivate students to woti; harder. But 
the results can he discoui^ing tw, if the results arc poor. 

In fact, the purpose of the lest shouW uflRwrt the kind of 
test given. Thus, a formal selection examination will 
normally cover a lai^ number of skills and topics lightly, 
and most questbns shoukl be of middle difficulty level. 
Mastery tests on a particular unit will be more intensive, 
with several questions on each of a few topics to see 
whether they are mastered or not. Some tests should 
relatively hard for all pupils, (e.g., IMagTiostic tests). Some 
will be ienj^hy and formal with elaborate marking schemes 
(e.g. , End-of-year examinations); at other times the 
tether will give shon, informal quizzes with emphasis on 
quick feedtmck. Sometimes the need is for many short 
questions which can be <*jectively marked. At other tinw! 
the teacher's purpose will be better served with a few long 
answer questions. It is important therefore that we think 
through our reasons for testing. 

2* Quality of a ^kmI examination 

Not all examinations are well set. Often they are too hard, 
or too easy, or not enough time is allows. Son^times the 
questk>ns are vague, or trivial, or provide clues for the 
'test-wise' pui^L Many examina^s are unbalanced, 
providing too many quesdons on some a^>ect$. and too 



few on others. Such weaknesses means that decisions 
based on the ^examination results will unfair, or 
|H?d jgogically unsound. How can we tell if our 
examination is a giH>d one? IVo imptmant features are 
Reiiahi^ffy and Validity, 

<a) Rvliahilhy 

Tests are reliable if they pri>duce vonsistvm results, if they 
produce similar marks on different t>ccask>ns. If a pupil 
gains 75*^ . in a reading comprehension test today, and 
only .Wf tomi)m>w, then the results are not consistent; 
the tests are not sulTiciently reliable to base judgements 
on. If a pupil is placed first in his class in a test of 
tr.ultiplication and divisbn of decimals today and is 20th in 
a similar lest tomorrow, we can conclude that the tests are 
not reliable indicators of his ability. 

To be reliable a test must normally be iong emnsgh to 
minimize the effects of chance factors in the content and 
skills included in the test. With a short test, a pupil may be 
lucky, because he happened to know ur guess correctly the 
few questions that were asked, whereas he knew very little 
alHiut the areas untouched by the lest. A standardized test 
of reading, mathematics or langu^ normally needs at least 
40 g(x>d c^jectivety mariced questions, to reach a 
satisfactory level of reliability. To make decisions about 
individual pupils, for placement, or grouping, or diagnosis, 
a teacher-made test will probably require more questions 
than this; for judgements about the performance of whote 
groups, a teacher cm get by with fewer. 

Just how kmg a particular test shouW be. depends on the 
type qIS material tested, the amount of supplen^mtary 
informatim available, and the importance dT the deciskms 
being m^. TTius a lest of a highly sp^fic skill such as 
arithmetical additii»i, or spelling, w tyring, may produce 
reliabte results within ten minutes. If however, we wish to 
examine a pupil's grasp of a variety erf mathematical 
relatk>nships, or his understandii^ of a period of history, 
and tl«n to make decisions about future schooling on the 
basis of tl^ results, we may wish to extend the test over 
two or three hours to gain maximum reliability. For such 
general skifls as essay-^writing ability, or oral ex{»iesskm, it 
is commimly found that pupils vary so much in their 
performance fmm day to day and fhom ie^pk to Xopk. that 
the only sure way to ^n adequate reltabilkyjs to test the 



fmpih on several occasions and several topkrs. and 
combine ihe marks given by iwo or three independent 
miulcers. This may not always be pr4Cticabte, tnit v^e 
simuld realize when our resuits are likely to be fallible. 

Other requirements of a reliable test are clear, precise 
directions, and reasonable lime limits. If students ixtv 
rushed* their performanre may not be typical. The 
questions shouki be clear and unambiguous, neither too 
easy nor too diflRcult; they should normally discriminate 
well between good and poor pupils, and they should he 
ca{!^l^ of reasonably objective marking. Otherwise the 
results will vary according to the values and whims of the 
marker. If a choice of questions is allowed reliability 
usually dr^s« because markers cannot compare answers 
soconsistratly. 

Tliese are si>me of the more important factors in 
ctetermining how reliable a test will be. It is possible ui 
assess the reliability of a test statistically, but that is a topi^ 
for another time. 

(b) Validiiy 

A good test must be valid. This means that, in addition to 
measuring a pupil s achievement consistently, it should be 
relevant to the main ohjevtives of the course. It shouki 
cover the unit or course adequately, sampling each content 
area and skiU in appropriate proportions, if a teacher 
knows precisely what his objectives are. he can usually 
tell, by analysing the questions of a test, whether they 
conform ck>se!y to tte objectives he has adc^ted i.e.. 
whether the test has 'content validity* for his purposes. 

To illustrate, a 60-item test of addition in arithmetic may 
be hi^ly reliable, and yet be quite invalid for measuring 
achievment in a course of modem mathematics which 
emphasizes concepts, relationships and rea^ning. TTie 
ol>^tives of the test do not match the teaching t^jecti ves. 
A 3-hcHir written examination in manual arts may give 
relish resuhs. But if it does not mquire students to show 
the a<tu«tl skills they have learned, it will have poor 
vaUdfty . Tte students who do well on a written 
examination may not be tlK>se who do well in the practical 
skills. 

Again a test of geograf^y which fiKUsses on factual 
details about populations, areas, climate, exports, capital 
cities, and the like would produce irrelevant results for a 
teacher wto stressed bri^ ccmcepts. i^neralized skills 
and underlying reiatbnships. A valid test of such 
objectives may require novel or fictitious situations on 
which to base questions so that a pufril can ctemonstrate 
that he has attained these objectives, regardless of the 

ERiC 30 



particubr f^ual details he has acquired. 

Sometimes tests lack validity because of *cuhural hias\ 
Questions may he unfair l^cau^ they assume that tl^ 
pupil has had particular exf^riences whk:h he has not h<td. 
or read hooks which were not accessible to him, or seen 
films or TV. prngrammen which he has m\ seen. 
Sometimes tl^ test may be invalid l^cause the print is 
illegible, or the diagrams unclear, or the paper 
in^wjuately proof-reaid. Such problems may distract 
pupils and cause changes in their rank order. Likew tsc. 
cheating will result in invalid results. If pupils can copy 
one another's answers, or gain prior knowledge of the 
questions the results will not reflect their true grasp of the 
subject assessed. 

To ensure maximum validity for his tests, then, it is 
important for a teaci^r to spell out, as clearly as possible, 
precisely what his objectives are, and to buiU his 
questions around these, in the appropriate pn^portions. 
Tests which develop witlwut such planning often 
degenerate into factual quizzes of the low-leveK isi>lated. 
easily testable fragments of the course. 

Guidelines for Checking Reliability 

1. Is the test k>ng enough? 

2. Are the questions clear? 

3. Are the time limits realistic? 

4. Are the questions of apprt^priate difficulty? 

5. Is the marking eflfective? 

6. Are the instructions clear? 

1, Has the choice of questions been kept to a minimum? 

Giudelines for Checking Validity 

1 . Are the questions relevant, important? 

2. Have all topics been assessed in appropriate 
proportions? 

3. Have all skills been assessed in appropriate 
proportions? 

4. Are there clues to the right answers? 

5. Is the typing and presentation adequate? 

6. Have all students had an adequate opptirtunity to learn 
the material tested? 

7. Is security actequate to avoid cheating? 

3« Planning the test 

If a test is not well balanced it will not be valkl. Therefore, 
to ensure proper t^lance. it is a gpod idea to draw up a 
plan or blueprint, list the main topics to be covered on 



one axis and the miyor skills to be devetoped on the other. 

Thus, a blueprint for a unit on Mathematks might hck 
like this: 

}»dlls 









5 


S 


Frsctioiiv 


1 




5 


9 


MtasurfnH'nt 


2 




5 


» 


Decimal 


1 


4 


5 






.1 


5 


5 


13 


Total 


10 


15 


25 


50 



The teacher who prepared this plan has cleariy decided 
that the most important objectives in his course are those 
concerned with applying the skills learnt in new situations, 
rather than memory work or routine computational skills. 
Therefore, 25 of the 50 questions ane devoted to 
application. Likewise, statistics is given more weight than 
the other topics, although all receive sc^me weight. 

Planning of this kind should be undertaken in ever> 
subject and the weights given should reflect the amount of 
emphasis given to the tc^ks and skills during the te^hing 
of the unit or course. For example, in Social Studies, the 
topics to be weighted might be: 

l^ation. Climate, Discovery, Early Setltement. 
Industry. Transport. Cuhure, The skills assessed might be 
Recall, Comprehension, Application, and Evaluation. 

An English test might have as its main topics: Written 
Language. Oral Language, Grammar. Fiction. Poetry, The 
skills to be tested might be Knowledge, Comprebenskm. 
Application. Synthesis (production of original work), and 
Evaluation. 

4« Test que^ions 

Question writing is an art that depends on clear 
uncterstanding of the subject and of the pupils being 
assessed, as well as a grasp of the general principles cf 
item writing. It helps, also, to have plenty of time, some 
imagination, access to other tuple's questions as models* 
and an opportunity to have your questions fKlited by 
colteagues. 

Several kinds of questitms can be used. aiKl none is idMl 
for all circumstances. Written test questbnscan be simply 
divided into two types: 

(a) Objective Questiims: These have right or wrmg 



and markers nhovM a^iee on wbkrh are rigjit and 
are wTXHig. 

<b) Suh^ctive QiWiitUms: Essay-type tests in which the 
pu{»l5 must respond to open-ended que?iiion*i by 
CMiposing their own an^wer^. There arc varying degrees 
itfcompietenesfi and correctness, 

OBJECTIVE TEST QUESTIONS 

(a) Muhiple'<hoict 

These consist of a stem, slating the queMion. and 4 or 5 
^%i^hhopihm 10 choose from, 

e^. (t) Whal is the area of a rectangle which is 5 cm long 
amj 3 cm wide? 

A, 8^ cm. 
*B, I5sqcm. 

C. I6sqcm. 

D. 30tiq.€m. 

Note that the 'diMnictors'* ( A, C and D) should be 
fHau^ble answeni for the pupils who might be unsure. 

(ii) If bread is placed in a refrigerator, it will no! 

become mouldy so quickly, because: 

* A, cooling stows down the growth of fungi 

B. darkness retards the growth of mcHjJd 

C, cooUng prevents the bread from drying out 

D. mouhJ requires both heat and light for growth 

TTiis question requires the pupil to apply his knowled{» of 
the relationship l^tween temperature and the growth of 
mcnikis. 

(b) Matching Quest iom: 

These consist of two columns of items selected so that 
fH^Hs can match the words or symbols in one column w ith 
the appropriate word or phrase in the other. Matching 
iiue^ions are useftil for testing fK>mogeneous sets of facts 
e.g*« matchii^ books with their authors, chemicals with 
tteir formulas, words with the parts of speech they 
represent, etc. 



Country 

1. Fiji ( ) 

2. Tonga ( ) 

3. Western Samoa ( 

4. Cook IslaiKts ( 




32 



Capital City 

A. Rarotonga 

B. Suva 

C. Vila 

D> Nuku alofa 
E, Honiara 
F- Apia 



Note that both lists shouM be kmogeneous. and one 
should he hmger *han !he other. The main fauh in 
preparing th^ questions is that cw:h list often contains 
terms which arc heterogeneous. They shouW all be 
authors, or cities, or chemicals, or jwts of speech, etc. If 
cities are mixed with minerals, and people, and 
of^^izations. ihey provide obvious clues to help the 
uninformed pupil. 

(c) True-False Questions: 

These consist of a single statement which the pupils arc to 
mark true tpr false, or right or wamg. They arc useful 
questions for a quick quiz, but guessing can be u serious 
problem with this type of question. This can be reduced 
somewhat by asking pupils to correct the false statements. 



Ring If F \^rilc 
Tor V ihc worrccl 
answer 

{\) :s<r'of44is4 Tori 
{ ii ) The volume of ;i mass 

of giis iend% 10 

incnasc as iu 

tcmperufure incrifasei* 1 ui F 

Oii) FijiS chief cxpint 

is copra T or I- 



(d) Completion Questions: 

These consist of a question or sentence containing a blank, 
for which the pupils must supply the appropriate word, 
symbol or phrase. Tliese questions are actually 
* semi-objective*', because there is often morc than one 
acceptable answer. 

e.g-. (i) What is the name of the instrument used to 
measure temperature? 



(ii) The device used to tell whether an e!ectric charge 
is positive or ne^tive is: 



(iii) What !s25^^ of 44? 



Which Type nf Obji^ctiye Test Question Shield Ytm Use? 

There isno one best type of item. All arc appropriate at 
one time or anotl^r. bui muUiple-choice questions arc 
more widely us^ than others in stiuidanlized tests and 
impimant examinations, Tlie following advantagpes arc * 
often claimed for multiple-choice questicms. 

(i) They arc nwrc objective ai^ reliable than essay 
tests or completion questions. 

(ii) They make possible the testing of a larger sample 
of the pupils' knowledge and ability in a short time 
than does the essay test. 

(iii) They enable the teacher to measure process 
skills as well as recall of simple knowledge. By 
contrast, true-false, matching and completion 
questions arc laigely restricted to simple recall, 

(iv) They arc easy to mark in large numt^rs, 

(V) They make it impossible for a pupil to gain a high 
score by guessing. 

(vi) Common weaknesses in pupi) knowledge and 
skills can be readily diagnosed by tl^ teacher. 

( vii) TTie questions themselves can be readily 
evaluated and improved by means of item analysis. 

On the other hand, multiple-choice questions do have 
these disadvantages: 

(i) They cannot measure pupils* creative skills, or 
ability to oigani/e material in a coherent manner. 
This is paniculariy important in language, literature, 
and other expressive subjects, 

(ii) They take much time and skill to construct. 
Poorly prepared questions may produce more invalid 
results than completion or essay questions. 



5* Suggestions for preparing test questions 

1 . Essay Questions: 

(i) Specify cleariy what is to be included in the answer. 
Compare: (Poor) : Write an essay on the French 
Revolution. 
(Better): In not more than 50() words, 

(a) Outline the main causes of the 
French Revolution. 

and 

(b) Explain why reform cov\ii not be 
obtained wh{K)ut violence. 



33 



(il> Use sevefBl short que^fiaas rather ihiin tong one. 
Ciii) Avoid OfSional questions wtere poHstble. hh ihey 
make marking mc*? dHfieuh* 

(iv) &!fore the te^. prepare a mixiei answer, ouilining the 
main crtterta and weights to he attached to each. 

(v) Mark one question for aH pupits before beginning the 
next. 

(vi) Mark without knowing the pupils* names, where ^ 
possible* 

tvii) Obtain independent assessments, wherever you cnn. 
The average of two morken^ is more reliable than the 
results from one. 

2. Objeciive Questions: 

general 

(i) Keep your questions brief, simple in expressiim . and 
free from complex verba! instructions, double 
negatives, etc. 

Uf) Test only the important facts and skills. Avoid trivia! 
questions. '*catch*' questions, and irrelevant material. 

m MULTIPLE-CHOICE QUESTIONS 

(i) The probtem should be clearly stated in the stem of 
the question. 

e.g..CpoiH-) 
Bats 

A , drive off harmfijl birds 

B. are enemies of man 
*C. eat insects 

D, eat rats 

The pupils must read all options before i hey 
understand what the problem is. 

(ii) Use only plausible distntctors 

e.g.. (poor} 

The Ptimc minister of Fiji is 

A. MrMuldoon 

B. MrFraser 

"^C Sir Kamisese Mara 
D. The Shah of Iran 

Many pupils could guess tf^ answer with very 
limited knowledge. Names of other prominent 
Fijians would provide better distractors. 

(tii) Ensure that there is only one correct answer 
e.g..(poOT) 

The population of Hamilton is 
A. less than SOOOO 



B. between 50 000 and 70000 

C. over 70 000 

D. over 80 000 

Both C and D are convcl. 

(iv) Avoid the steretnyped bmguiige of textbooks in the 
correct answer 

e.g., (poor) 

The Renaissance in Europe was characterised by 

A. a decline in tmde 

B. many ncligiouf^ wars 

*C, an unusual efflorescence of creative talent 
D. the kiss of colonies 

(v) Beware of grammatical clues and vert^al asst^ciations 

e.g., (poor) 

The French scientist who discoveied the basis for 
f^steurising milk was 
*A. lA)uis I^steur 

B. !sas«: Newton 

C. Frmicis Bacon 

D. Alexander Graham Bell 

There are two clues to the right answer here. The 
question shmiid be rephrased. 

(vi) Make the correct option the same length as the 
distractors 

e.g., (poor) 

Sw^ts are not recommencfed for eating between 
m^s as they 

A. cause diabetes 

B. supply excess energy 
stimulate the bite 

*D. dull the appetite for fw>ds rich in other necessary 
elements 

D sounds right because it is k>nger. and so makes for 
afulter statement. 



rO COMPLETION QUESTIONS 

(i) Use a singk blank in each question 

e.g., (poor) 

The of was written by 

A pupil may know the facts required, but 
con&sed by the question. 

(ii) Place the blanks near the end of the sentence 

e.g.. (poor) 

is tl^ rame usually given to the 

breidcdown of the soil by various jHiocesses. 



e.g.. (bitter) 

The breakdown of the soil by various priKcsses is 
usually called 

(iii) Make all blanks the same length 

e.g.,(pow*) 

Villa is the capital city of the 

A pupil who was not sure wl^ther to chi>ose 
Solomon Islands or New Hebrides would have an 
obvious clue here. 

(iv) Make sure that there are a finite number of acceptable 
answers 

e.g., (poor) 

Columbus discovered America in 

e.g.. (better) 

In which year did Columbus discover America? 



6^ Conclusions 

Much more could be said about writing sound 
questions. However, a careful reading of the 
principles outlined above and some meticulous 
editing by your colleagi^s simuld make for better 
reliability and validity than that in a test which grows 
Topsy-like' without planning ami fon^thought. 

Teach«^ wf» wish to improve their assessn^nt 
skills further can learn much fmm studying examples 
of well prepared examinatiims and standardised 
tests, and ^om aimlysing the resuhs of their own 
tests, using item analysis. Tliis and other topics can 

folbwed up in such boc^s as: 

EbeK Rot^rt L. Essentials ofEdm athnal Measurement, New 
Jersey. Prpntice-Hail. 1972 (the test devetoper's biWc). 

Peddic. Bill, and Graham White Testinf^ in Practice. Auckl&fid, 
Heinemann Education, 1978, (short, pithy, a fmctidng leaclm'*s 
guide). 

Queensland Department of Education. Si hi>d Assessment 
Pnu vduresTilks: I An Introduction. 2 The Muhipte ChtHce 
Item. 3 As^sment in Eii^sh.4 Moderation Within Schools. 5 
Afifiesjiment in Engii^. 6 Asse»«nf)ent in Foreign L^n^ages« 7 
Planning a Summative Ass^sment lYc^mnie. 1971-5. 
Available from ACER, 

IjEaid. J.F, Consfrm tiim and Analysis ofClassnnm Tests. 
Melbourne, ACER. 1977. 



Criterion-referenced 
Measurement 





o 

ERIC 



Ock f nzzeii 



Criterion-referenced 
Measurement 



Glenn Rowley and C>>Hn Macphcrson 
Momsh and La Trohe Universities 



A mail once owned » kU^ u hivh was inclined to jump u^t 
back fence and trnjoy rhc delights of ihc neighhaurhiH^d. 
Deciding ihaf he needed a new fence amund his yaru, the 
mgn was confronied wuh the prnWcni of determining how 
hi^ the fence lihould he. BeoiuHe he wanted lo approach 'he 
task systematically, he tiH>k his dog tu a testing agency^ where 
the animal wan put through an exten«iivc seric!* of juniputg 
«:<itii. Eaj^rly. he awaited the results of the lest?^* which took 
soma? lin^ to arrive. The reciting ^cncy you sec, carried out a 
nationwide df^-tef*tine program, and the result*^ had to 
processed by ctimputcr, along with those of thousands of 
other dogs. 

Finally, the test resuh> arrived in the mail. They wca* wry 
detailed. His dog» he Icanuxl, was about average Un 
Australian d&^. It was, however, well above average for 
das^hunds^ and a little below average fur greyhounds. He w as 
told that nationwide norms and even ncighbourhoi^d norms 
could be provided given time and money. Although he'd love 
to have known how his d<^ companrd with these in the nvxi 
street, the owner regretfully declined. The tests, 
unfortunately^ had not told him what he had set out to find ™ 
how high a fence his dog could jump, Had he asked an 
unansw^erable question or had he just asked it of the wrong 
people? 

For twenty years now protagonists of criterion-referenccil 
measurement have been saying that testing has been gi\ ing us 
the wrong kind of information. Tests, they argue, have been 
designed to provide relaiive information about children 
(when: does Jc^hnny rank), when what we need is abwluu' 
mformation (what skills does Johnny possess?). Since about 
1970, considerable effort and scholarship have Kvn 
dedicated to finding, developing and promi^ting ways to 
make tests which yield this latter son of information. Tlicsc 
efforts have res-ulted in new terminology, new ways of 
eoti&trueting and analysing tests, and new ways of reporting, 
explaining and interpreting childrenN performances. 



What is criterion<*'referenced measurement? 

If I were to tell you that Mark hail just scored 1 S out oilO in a 
gcc^raphy test, ^ hat would you knowabciui him? Very htiie! 
Geography test* .x^ver a wide range of content, and even 
those written for one specific graiic level can range from ver> 



easy to very dilFiculi. I-or many tests, his stx^re would be 
dcfvndent as much on the iudgemenis and whims of the 
marker as on Mark's own perftirmancc If V4hi au' to 
understand anything at all about his iuhicvemcnt in 
geography, I will need to pnnMdc you with more anforniatiim 
than )UM his 

Historically, educaior> have reei?gni/c4.1 that m^rc 
utfonnaiion is needed, iUid have si>ught lo proviik that 
information in the form o5 Cinnpuri:wuis. If wv kjtew that H 
was the third- highest siWe obtained in a class d 3 1 , wv wimid 
feel moa* cintdortable alH>ut evaluating Mark's achievement. 
If we knew that the avemge swre of the class was 17, v^t 
would sidl feel comfonable about it. although our evaluatst^n 
wtHjId be quite dilTereiu. It we know about how the cla?^ 
eompan's with oiht^r cla«»si^ \sv will be even itK^n? 
comfortable. 

Some teailiers have been mclii^^d ti> treat test scores ah if 
they have absolute meaning ~ *,e., a scoa* of 80 percent hasi 
its own intrinsic meaning, and if Chris scores b5 percent on a 
History test and 80 |ieroent on a Spelling test, then he did 
K^tter at SpeUing than he did at History, We know,0f course^ 
that this need not be so. We knowof the different that exist 
between teachers in the standards they expeet, in the 
difficult)- of the tests that they set, and in the stringencs^ of 
their marking procedures. 

Sorfft-refrrcncing '\s one way in which eduoitors have 
sought to escape from this dilemma. Over the past 70 years or 
so, an armt^UA' of techniques has been developed thai can add 
meaning to a single test score by cumpaiing it to some 
reference group (or mfrm gmup). Thus, standanlimi tctits 
provide tabks, whtch cait be used torefera f^ingk score to the 
distribution of scores obtained by earefully-cho&en 
reptcsentative samples of pupils of the sann' age or class level 
thn>ughout the state, or even the nation. Sally's scute of 12 in 
Spelling takes on a new meanmg if we know that 75 percent of 
licr peers scored 32 or le:,s, that is, that she is at the seventy- 
filth percx^nnlr of a national sample of children of her age. 
Inmhermiw, it is possible to rcfer scores from different 
spelling tests to the same scale, provided the tests are normed 
on the same group. Norm-referencing has provided us with a 
range of techniques and derived scores, such as perccntik 
ranks, age and class norms,, profiles, standard scores, T- 
SCORES, stanines, etc., which are intended lo add meaning tea 
single score by Unrating it in a distribution of slx^tvs from 
comparable children. It should be remembered, though, that 
the meaning which is ailded is n^lativc meaning, and 
psychometricians have sometimes gjven the impression that 
test scores have no meaning except relative meaning. One is 
reputed, when asked, How^s your w!le>\ to have replied 
'(*4mtpaa*d to what?**. 

In 1%^, Robert Glaser published an article called 
*'IuNtrUiiiunal technology and the measurement of learning 
outcomes*' in Amvncan !h\i'hiylL>gni. Although brief, the 
article proved to be oi historic imponance. because it was in 
thiv article that the term 'criieritm-refercnced measurement* 
was introduced to the world. Glaser was deeply involved in 
the development of procedures lor individualized 
instruction, and he noted that the measurement techniques 
which he had teamed essentially hi^w to construct and 
evaUiatv gwd tu>rm-reU'renced le^t^^ did tiol seem 
appropriate to his nvedv. He knew how to build tests that 



woe effective ai jiprcading kitis' scwes oui^ m» that an 
accurate and reliable ranking of xlmr kwh ol achic vnu'nr 
could be cOitained. But he w anted a test which ctmk\ ivll him 
that Penny had dVettivcly mastered thi> unit of instrucf inn 
jmd vmh ready to prtKccd to the next, ^'herc Pcntiy sliHid m 
lelaiion tii mher children for, cquivalcntly. where other 
children stmid in reiaiion m Penny) was irrelevani ro rhe 
deci^iiiin which had to he made. What wa> needed* iilaser 
atn^uoJ. a critcrion-aMerenced le^t; unc whah drew us 
meaning mn twm ihe relation hciwcen a ^cfirc and a ^vi oi 
mher hcoKCh, hut fmni the rehnion between the tv^i and a 
criieftnn (or domain ot K-ha\ioun» ) which the test is designed 
to repreneni. 

Glaticr's arpurocnt strucK a ix>pi>nsivc chord amonj; tnany 
edi}cator5« although his idea> took ?iOtnc yxars to take nwi. 
The real breakthrough tu-gar in \%9 when James Popham 
and Ted Husck published .m article in the JfimrNuf of 
EdHiuiional \Ua$uremi^2t entitled * Implications of criterion- 
referenced mr«^urcmeni/ Clhaity and can to read, yet 
bursting with important idea«i, this article brought G Wr% 
concerns before educators in a w^y which couid not be 
ignored. By raiding a host of qucMions to which adequate 
ani^wer^ fiimply did not cxim, Popham and Hasek ^timuIated 
mi e;??p}wion of activity in the field of etlucational 
meai&uremcnt which leti to the puhHcationof over (>00 articli s 
on criterion-referenced mcasiirement by 1978* and which has 
continued unabated to this day. 

Where arc we in testing today? 

Over the past sixty years, there have been tremendous 
techiuilogica! and theorencal developments in testing, 
particularly the analysis and selection of items, and the 
refinement of tests, The inHuence of the computer ivjs been 
substantial. It has enabled lest puhli*»hers to run trials on 
items, and to de\ elop norms on samples running to hundnrds 
of thousands in st^me cases, with ver>- little inconvenience to 
themselvc^s. We ha\e sct*n the development of whole new 
areas of theory, e.g., reliability, validity, and gcnerali/abihty, 
le&iing to important new ways to appraise tests, and to an 
aw^n^ness of the ct>ncepi of eritjr of measurement and the 
lack oi' precision in all test vaia*s. We have meihtxJs of item 
analysis which enable us to try items out* analyse the 
responses, and to select and mmJify items so as to proiluce a 
final version of a test having the *ven best* psychometric 
qualities. And the state of the an is ver>' advanced indeed. In 
the 1970 manual for the ( \difomhi Achtet cment Tats {CAT), 
sixteen reJiabiliiy coeificients were reported* ranging fn>m 
.977 to ,986 over grades 1 to 12. 'ITiis is incredibly high. In 
this a-gard at least, it seems that test development technology 
has taken us about as far as it is |>ossible to go. 

On the other hand, howe* cr, we do not seem to be as well 
advanced in understanding what it is that we are measuring. 
In measurement jargon, we have learned to understand 
reliahiluy a great deal better than wc understand vuliduy. 
While the reliability of the CAT compares favourably with 
that of a ruler or a tape measure, the measures themselve*^ do 
not inspire the same degree of amfidcnce. The dificrence, ;>l 
course, is that with the ruler, we undei stand much better 
what we are measuring, I know fairly well what a score ol 

ERIC 



^Ikm means when I measure wnh a ruler. Bui I do not 
undentiUHl so Will what a ^coa- ol so many \\yn\\s nie;msi>n 
the r.A T, or on other achievemcni tests. Furthermore, when 
given an individuars saire on a test, it is not necessarily easy 
to see what shtniUI K' done tor the ch'ld Few test 
constntctiirs %voujd claim itiai tin- sitne alone will tell you. 
u may be what you want 

As pmft^sstona} teachers we will not K- using test scores in 
isolation. Other objective and subjective information will put 
the test sviMx in }Hrrs|x-ctivc> llie test score tells us that the 
child stands high or low in ctmiparisun with some other 
children but, as is well known, children can ohiam low scores 
tor a variety ot reasons. Not all children whosa^rc below the 
20th lu^rcentile on a reading test arc the same and nor should 
wc conclude that they shtiuld \k treated in the same way» If 
they anr treated identically, it iscenain that they will respond 
vers- diffeannly. 

In summary then, during the course of this centur\% we 
have become nK»a* and moa^ proficient at developing precise 
measuring instruments, btn we have not pmgrosed to 
anything like the same extent in understanding the 
measurcmtnts we make, or in making use of the information 
that they give us. Why is this so? 

Testing for Competence 

'Hie use of tests as devices to certify thai certain people have 
achieved competence in certain fields has a long history, 
going hack at least 4,000 years. Onainly, formal 
examinations werc used in t^hina as far back as 22<K) B.C. 
Public officials at that time were required to present 
Themselves for an examination e\'en* three years to determine 
their fitness to R-main in office. If, after three examinations, 
they could not be promoted, they had to be dimissed! Civil 
service examinations lasted in China until 1W5, and 
markedly influenard the development of civil service 
examination systems in Britain, l"nince and the U.S.A. 

Kxaminations have also been used in universities for many 
years. There are recitrds of examinations being held at the 
University of Bolc^na in J 2 19 A D. By the middle of tlur 
nineteenth ivntury, written examinations were widely used in 
Britain. Kurofx- and the r.,S,A. both for the awarding of 
degrees, and for deciding who should be permitted to practise 
proiessions such as law, teaching and medicine. 

The tradition of using tests as Jtrnrs w irrnfy mi^sury of 
some subjtxt-maner and or skills ctmtinues liHlay. We are 
UNually rcassuad by the knowledge that our physician has 
passed a hmg series of examinations, and the same is probably 
true of lawyers, dentists, pilots, teachers, electricians, 
plumbers and so on. ITie possession of a certificate attesting 
to mas'^fry of an area is seen by siK iety as a way of ensuring a 
minimal level of com|>i*tcnce in various professions and 
skilled trades. 



Testing for DifTerentiation 

Hierc is a second enduring tradition which has contributed to 
the development of our ideas about ti-sting; this is the 
iradjtion assinriated with the />m \ of individual 



diffamLc.^. Himx llic laie eighteen huiidrcds* a ma|i»r intea^Nt 
in psychology ha** been in ihc nmge dilkTi iu quahiH ^ atui 
ahtlttie^ which pwple i^nM*%s, and in Hmlinp \v,iys of 
fikmiirying and ^tudyuig these diii{crcniv>. Ahhiiugh it was 
«ut the llfNt attempf » the I B}iu*i scale i^ gcncraHy nrn as h 
liindiiiati in ihe muUv nl' individual diljeivna^s, Bmers tc^i 
irdividmlly adtninistenrd. Subsequent years vavv major 
advafWe$> in tlie devek>pnH*nf ol mea>ufe^ which could be 
admtnislcred 10 grtmph U.c, wniien tesi>K In the United 
States* iht* greincM shnt-in-tk'-arni to ihe dcveh^pment iit 
gitnip testing teehniqut^ w;is prv>\ided by ihe advent nl 
Wcirld War h Tpnn the Tnircd Slates' entry into the war. 
%*arioiis cinnmittees of psychi>lopists were orpani/ed lo 
comrihuie iv the war eflurt, Om ol these was a coinrnincc on 
the pwchtilogii:u{ examination oJ recruits, llu' 'Army Alpha* 
test whkrh was develo}>cd by this commiitee was the tirst 
group intellt}:ena' lesi to he used on any large Sisdc. It was 
administen^d to a million and a quarter men during the war, 
mid was UKed for selectini; men for officer training and so on. 
It appears 10 has'e been rej;ank*d as an enormous success 

'Vhc use ol the Antjy Alpha test durinjj the war sparked o{T 
a }onp }x*rioil in the United States when the major thrust of 
education psychology w^s towards the development oS 
measures individual differences — firstly the so-called 
SmelUgence* tests and then btcrs school achievement tests of 
cmc kind or another, Various tcchm»)ogjca} devclopmetns 
helped to kick things along, including the devi^' rnt oj 
mechanical test-scoring machines, and later* c sc, the 
use of the eompmer. In the VS. today, schi^ol childa'n are 
given standardized achievenH.-fn tests \i\ an extent which 
would stagger must Australians and Xew Zealanders. Ii is a 
routine pan of their schi>*>ling, and thi* basis oi a multi- 
million dollar industry. 

It is intea*sting to note that the techm^logy used in the 
development of large-scale standardised educational 
ackit^'^mm tests is essentially the same technology thai wms 
u$ed in the development of psyctiological ifuelii/^mr tests. 
The tests are intended for wide-scale. perhaps nationwide 
usc^ therefore, the items have to be ones winch test 
generalized dilTerences. Items that an^ specific to thi^ or thai 
curriculum, or to this or that class level are excluded. If you 
want a test to sell nationally, yvu make a test ol 'reading 
eompn*hension\ or of 'arirhmetical fluency', not ont- on 
'ability to read shop names in High Street*, or other specific 
skilis we may fmd it valuable to teach. 

Educational achievement tests w hich have been de\cloptd 
along these lines ean sometitnes loi^k very much like 
psychological aptitude or mielligence tests, and otten wil! 
have similar properties. What is interestmg to note is that the 
nwre closely tlie test approximates to this nu>del, tlK bcftct )! 
looks psychomeirically, that is, to the statisticians who test 
tests. Thus, as an achievement test is suecessUilly re\ ised, 
<itatistical properties keep improving, and it becomes niiire 
and moa* like an inteihgence test, and fesv and less a measure 
of the actual content that \s taught tn selKnil. 

Item Analynk for Norm-refcrunccd f efits 

There are several reasons why this situation has developed. 
One is the nature of the priKcdures ul iiem atialysis u ha h are 

ERLC 



commonly used. 'Dicse techniques, which aa* drseribed in 
dctaU in most textKnik'*' in\ educiUiotial testing and 
measurement* ^m* usal to identify *giHHl' and *bad' items in a 
test. Thejr eflect, generally ^ is that items that tend to 
contrtbuie to a wide *sprtad* t»f scores a*mam on tfte test, 
while thi>H: that do not are discarded or miHlitled, Ry 
Ibltow ing these pn>ardun%, it is possible to proiUice a lest 
which has the fmest {>svehometrie properties. 

fhcre are. however, serums problems. Firstly, the effect of 
the nem-selection procedua* is w exclude items which anr 
unlike the rest of the items on the test , and to include mostly 
items which are like the rest, So wc end up w tth a test in which 
the itetus ul! nieasuri* prettv nmeh the same things, as of course 
the test ctuisiructer iiuends. rsychometrtcians like to 
describe the itetns as being umJimmsii^nal and the test as 
havmg a high degree of i>/;4 rn*i/i<wwem v- Tests having this 
pn^periy (which is highly valued in psyehometrics) are the 
mi^st ellectivc in spreading people out along a scale. They arc, 
then, ver\ eUective, mmH-n/errHti^J tests. 'ITiey allow us to 
rank people on the attribute which they measure with the 
greatest deganr of confsdetice. But the a' is a price to be paid 
for this. What is the attribute measured? Is it what we wanted 
n^asuad? Is the test valid? Unless great care is taken, wt can 
pnuluce a test which provides a pure measure of a pure 
aitribuie, but fads to a^flect accurately the various emphases 
of the curriculum. 

Ciiven the procedures used in constructing the test, the 
result is as near to inevitable as anything can be in education. 
The pnH:edua*s wen* ilesigiK-d originally to develop good 
nt^rm-aiervnced tests. "They have been borrowed faim the 
pn^edures used in the psychology of individual difference!^ 
where they work very well Psychologists zvam to measurc ihc 
undet lymg trait, whether it be general ability, or any of a 
variety of special abilities, lliey mmt a measuiT which is 
uofdmieosiunah and hence psychological pure, '^icnerally, 
they :u^m a measure which describes a reasonably stable 
property of an mdividual. But the pnt>cedttres we have 
borrowfd fron5 psychologists have served us less well, since 
we are usualh Un^king to measure changes we, as teachers, 
hine brotight abom. 

The L*ic of Tch:k for Selection 

i n the early part of the cemury it w hs necessary to select those 
in the primary grades who wea^ most likely to profit from a 
secondary education. Lateron, various external examinations 
in the senior years of high school have filled a similar 
tuncfion, selrcting thosi* likely to profit from tertiary 
eilucation. In every case, education w^s si*en as a rommodity 
whuii was to be made available to thost* who could prove 
ihemsehvs most worthy ol it Ai a time when there was 
smiplv not enough m^.titutitHiali/ed education to go around 
this made sense. VC'hea selection is the ultimate am* we need 
a lest which can spread people out, and make rciiabk: 
distnu turns anumg them ^i.e., a gtHitl niurri -referenced test). 
The tests which we -served thi^ purpt^ : well enough, and 
were therclore sati^factitrv in ihetr own terms, 

II we a^^ecpt that one of the present roles oi the stIuH)] is to 
help each child to learn well as pisMble, we need to use tests 
ditk-ienils h IS only in univcrsiiv entrance examinations that 

3r» 



^kdion nwil be the maior isvik' any lonjzcr At aH levels 
beknv ihih, and im^t ahovc, wc imght to be using tests as an 
iiu^tructiona! device ~ somciiiing to help us dn a hcncr job ul 
leaching* not somethitig to help us decide who is worthy oi 
the benefits of our feac':ing. Ir is tlie reah/ation and 
mxeptance of thb which nas led to theexploMon nf intea^st m 
a*Jterion-rcfcrcnced testing in nrcent years. 

How are criterion^-referenced tests 
constructed? 

From what has been written already, it should be clear that 
fht basic dttTerence between a fmrm-retcre need test and a 
crilerion-rcfercnixd test is in the way in which nrores are 
mferpreted« rather than in the test it»^elf. It is not fx^sible to 
pick up a test and identify it as a nonn-reJercnced lest, or a 
criterion-referenced te^t, juht by examining it. Any Lest can be 
norm-refcmtced, although wme {e.g., those which produce 
sciores whh little variability) may not be very eiTcctive norm- 
i^feimced tests. But not c\w te«t ran be criterion- 
rcfcrenced. Unless the test has been constructed with that 
purpc^ie in mind, it may not be po^^tiiblc to relate test scores to 
a clearly-dermcd set of sk^ll^ or behaviour**. Tests which 
contain collections of items based on fuzzily-defined or 
undeimed objectives cannot yield satisfactory criterion- 
itefeienced interpretations. If mc^aniiigf ul interpretation is to 
be achieved, at least the following requirements must be met, 

1. The ohJeanTs of imrnicnof^ musi he dearly defined. If 
students' perfonnana^s are to be dcscritx^J as what ihin* 
can and cannot do, the ob}ectives must indicate precisely 
what skills or behaviours aa* aimed at. In practice this 
means that all objectives have to be expressed in 
behavioural terms. i,e., by specifying exactly what the 
students will be able to perform, a! the completion of ihc 
teaching. 

2. For euih i^}eim\\ $utjicient uefm ntusi he urinefi xo piv<' 
some assurance that achievement of that objective is 
being reliably measured. StnclIyspeaking,one criterion- 
referencx^d test measures the ach?evement of one 
objective, although the term is used loosely to descriK- 
collections of items which measure collections of 
objectives. 

3. Item sclectim must he m the hash ho:v ;vell r^^e iwms 
refieer the hehinwun sfh\ ifted tu the ohieernes. Selecting 
items on other bases (e.g., on how well they spread the 
scores) makes it moa- difficult to pmvide criterion- 
referenced interpretations of lest {>erlbrmance, 

4- Standards of f^rformanee nmi speeified. Sometimes 
standards of fHTformance can be related toout-of-school 
situations. They help define what appmpriate standards 
are. Examples arc: the proficiency needed to ofvrate ai^. 
automobile, the pmficiency needed for a particular hh 
(e.g., typing skills), the proficiency nte^ded to be ^elt- 
sufficient in i complex society (e.g„ writing a letter), b'or 
other situations, the best we can do is insist upon rfusfcrw 
But what is master)-? Does it require 100 jvrcent success 
on items relating to that objective? If not, then what level 
of success do we set as an indication of mastery? 

ERIC 



Although mtich work has been done on *the standard- 
setting prohk^m' it aMiirtins a matter which cdn only be 
rcsohed by the use of human (and, in a sense » arbitrary') 
judgment. 

If these steps have been {onovved,sCi>rcs from the test will 
yield the knys of information we seek. More detailed and 
elabi>nite blueprints for the construction of critericm- 
referenced tests can he found in other sources, e.g., WJ. 
Popham'ji 1978 testbiHik, l 'ri:eriof2-referem'edAiea.\uremefir, 



What are the limitations of criterion- 
referenced measurement? 

Naturally there are many, and we can only focus on a few of 
them. 

Criterion-referenced measurement ha>i found its most 
freijuent use with curricula that be defined m a finite number 
of specific skills or behaviours which the pupils are to master* 
While for many curricula, this xmy be possibk, it is ctearly 
not universally so. For many teachers, the specification erf 
precise outcomes (and the same outcomes for all pupils) xmy 
seem quite incompatibk: with their appmach to teaching. 
Some teachers may find that some of what they teach is 
amenable to this approach, and M)me is not. In this case, their 
tciiting strategies might embt^ce critcrkm-rcfcreiKred 
measurement only in part, and retain more traditional 
apprtm-hes where they seem appropriate. 

Some educators have suggested that criterion-referenced 
measurement is appropriate for at^scs^sing the effects of 
ti^ining, as distinct from education. It is possible to 
distinguish between two types of objectives: mastery, w 
minimum essentials (certain specific skiUs which can and 
should be achievedby vinualJy all students and which are 
necessar>^ for further study)^ and developmental (more 
generalized abilities such as problem-wlving and clear 
thinking, which one can never really claim to have achieved, 
but towards which we hope all our student^^areprc^ressitig). 
For mastery objectives, a criterion-referenced approach is 
possible, and probably essential; for developmental 
objectives it is much more difficult to apply. 'JTms criterion- 
referenced measurement has been applied most effectively in 
the basic skills areas, less so in pans of the curriculum wh&tt 
the air is moa* rarified and the objectives harder to define. 

The ftKus of criterion-relercnccd measurement is on the 
achievement or non-achievement of certain competencies, 
and the emphasis js not usually on the extent to which a 
student has thieved excellence in an aa*a. In fact, advocates 
of criterion-referenced measurement frequently see the task 
of testing as being to distinguish between students who have 
mastered an objective and those who have not — between 
'masters' and *non-masters\ But not all of our teaching is of 
this nature, and teaching which is designed to encour^e 
excellena* in a field of study may not fit very comfonably 
within such a framew ork. For many, and probably most of 
the skills taught in schm)ls, thea* are not just two fcvels of 
competence, but an infinite variety , ranging from the highest 
level of skill all the way down to complete ineptitude. 
Criterion-referenced mtasuremcnt, when used to classify 
pupih into the categories of 'master' and 'non-masier% 

4v 



canni^ portray the range of' abilities pn^nt in a normal 
grmip erf children. 

How long does a criterion-referenced test have to be to 
provide an adequate sample of a behavioural domain? Rules 
of thumb do cKki (e.g., Popham suggests a minimum ot 10 
items) but the qucfiticm i% one which admits ot no single 
answer. If the tasks in the don^in (and hence in the item$) arc 
very similar, wc can make do with fewer items; if they arc 
varied, we would need more items to achieve the same 
accuracy. And, mosit importantly, the length of the tciit mui^t 
reflect the importance of the c*'x:isioa^ made from it, and the 
consequence?* of being wmng. Vhc more cniciul the dewij^ion* 
the more itemt* we would want to include. 



Conclusion 

The advent (really, the rcdis^cover>) of criterion -referenced 
measuttrment ha?i undoubtedly been an important step in our 
thinking about edui^tion and testing. In tnany ways, we will 
never be the same again ~ particularly in the way we report 
information to parents. To the extent that criterion-- 
referenced measurement has forced us to focus our attention 
on reporting what children can do and what they cannot do, 
its effects cannot be anything but beneficial 

But criterion-referenced measurement is not going to be 
the answer to all our problems, and its advocates would do 
well to recognise that xhcrt: arc simations in teachmg for 
which it is just not useful or practical. One approach 
emphasizes the kind of information we get by examining the 
content of the test itself, the skills ra^uircd to do well on it, 
and so on. TThe other asks 'hoi^* well do comparable children 
do on the same test?* The two kinds of information 
complement one anotlnrr, and in most situations we do not 
have to chtH)se between them — both are useful if they help 
us to evaluate the child's performance. Both can be obtained 
by studying the scores from a well-constructed critcrion- 
rcfcrcno^l test. Both can be obtained from a well- 
constructed norm-referenced test that details item content 
and has a comprehensive' teachers' manual. There may he 
occasions when we need only information about one child's 
pn^ress on one skill ~ a critcrion-referencetl test is the 
answer* However, there will be occasions when we want to 
rank students not on a specific skill but on a broad range of 
cai^bilities. In such casr^ the traditional type of norm- 
referenced test would be more appropriate. The type of 
information that is required and the ways that test scores are 
going to be used should determine the type of test that is 
i^ministered. 

It should also be pointed out that there is a price to be pajd 
by teachers who want to reap the rich educational harvest 
offered by criterion-referenced tests. Such testing, if done 
properly, takes quite a deal of time and effort. Teachers who 
are already overburdened witii the maity demands of 
preparation^ teaching* counselling, etc., may find the 
prospect of preparing whole sets ol criterion-referenced tests 
more than a little daunting, in spite of the benefits to be 
gained. However, if teachers with similar tcachii^ objectives 
are willing to shaa* the tests they write, a wealth of criterion- 
referenced measurement material could he made widely 



available after only a n^Hkratc effort by each individual. 

An exciting pi^sibiliiy for the ver>^ na^ futuie marries the 
growing interest in criterion-referetHxd m^suremcnt with 
the intixHJuction of microcomputers into many schools. It 
appears that the next step to be taken in the computer 
education movcnwnts in many countries may be the linking 
of individual school systems to reg^nal or statewide 
nctworlui. (Indeed, this has been the cs&c in Tasmania for 
some ^^ears now.) Criterion-referenced test spectficarions, by 
deftnitkm^ have a %'ery high level of descriptive clarity. 
Anyone reading them can fairly quickly be niBde aware of 
exaaly what is being measured by a particular test. Inuigine 
the quite feasible situatk>n wl^re a teacher is looking for a test 
relating to the policies of early colonial governors. {S)hc sits 
do^wi at a school micro-computer, hooks into the neUrork 
and within minutes is perusing the first level of specifications 
for criterion-referenced te«ts that are in some way connected 
to the l«?>*words (s)he typed into the sys^m. Those tests that 
look most pit^mising can be evaluated ftirther by caUtng up 
the next level of ctetail in the specifications. Finally, a test that 
will suit the tcacher'ti purptM^s is found and at the push of a 
button the actual test and its specifications are printed out on 
the sdiool's printer. All that need be expa;:ted of the teacher 
is that (s)he will at some time contribute to this criterkm- 
neferenced test bank. But first it would benefit many people if 
teachers* ona they construct a criterion-refeienced test, let 
others in the same teaching area Iht aware of its existence and 
availabilir/. 



Notes 

Ivxccllent eJcmcntar>' levtl accounts ol criterion -referenced 
mca^urenient arc contained in 

Popham, W.J. Mi^Jvm Educatiimal Meizmremfnr^ Enplcwood 

^affs NJ, PrcfJtice^HaU, 1981 {C:3)aptiT 2). 
and 

Pt)pham, WJ. Cntmon'TeferenieJ Sieasuri^pnentn EngJewood Qiffs 
NJ, Prcmia-HalK 1978. 

Two articles of historic importance, and which make excellent 
reading, are 

GlafiiT. R. 'Instructional Technology and the Measurement of 
Learning Outcomfs\ Ammt'un Psvchnh^isu Vol. 18, pp. 519* 

and 

Pophani, ^\]. and Husck, T. implications of Criterion -referenced 
Measurement, 7^?«rf?iii rf iiduiUfsonul Measuremt^mn VoJ. ft, pp. 
1-10, i*/69. 

H^r an up-io-kiate review oi the many recent technical 
developments in mterion -referenced measurement you might 
cnnvuil cither 

Berk. KJK. ^ed. f CnWrttm-rcffrvnn'd Mcasurenteni: Tkc State of 
ihe Art, Baltimore, John HopJum l*njversity Press* 1^1. 

or 

Humhleton, R.K. icdJ *0^nirihuiiom to Oilerion-referent^d 
Testing Tcrmtno!<^y\ Special i^sue of Applied Psyihnft^cal 
Miu^unmcni^ Vol. h No. 4, 1980. 

Dr Cilenn Rowley Senior Lecturer in Kducaticm at Monash 
l/niversity, Melbou . : Qiiin Macpheivm i*» a teacher, and a 
grisduate student at l.a Trobe I'^njvepiiry, Melbourne. 



4i 



Item 2 



set 



nurr^x^tMfo 1967 



Investing in Item Banks 

By Nefl Reid 



MID-YEAR exams are looming, and Mr Davey has a 
paper to set tor his senior maths class> Going to his 
dassroom store cuptx)ard he drags down a battered man- 
ifla fokJer labelled 'Exams, bulging with dog-eared and 
yellowing papers. He flicks through the tc^ few copies 
looking for last year's senior mid-year exam and the ones 
for the three years previous. On a sheet of lin^ paj^r he 
copies out those questions with double ticks or SSFG 
(sorted sheep from goats) written in the margin. He studi- 
ously avoids those with large crosses alongside or margi- 
nal notes of 'hopeless', 1oo hard\ ^diagram problem', and 
lakes too long'. In 30 minutes he had his mid-year exam 
ready to take along to the school secretary for typing. Mr 
Davey has. in fact, been using his own embryonic, and 
rather crude item bank 

What IS an item bank? 

ITEM BANKS, sometimes called Item pools *question 
banks'. Item files , 1esl item libraries or *item coliec- 
ttons\ are variously defined. For the puri:x)ses of this article 
they are regarded as being a large collection of accessible 
test questions. By 1arge\ we mean that the number of 
items is many times more than wouW be used in a single 
test, 'Accessible means that the items are classified, in- 
dexed, organized or arranged m such a way that they can 
be retrieved readily for test or exam assembly purposes: 
there is a system to make it easy for potential users to 
reference the items and to choose among them. And, 
under this relatively unrestricted definition, a variety of 
items (questions) can be considered for inclusion in a 
bank- true-false, multipie'^choice, short answer, extended 
answer (essay), even practical exercises. The items may 
cover many topics, different achievement and ability di- 
mensions, and be used for a variety of purnoses and with 
different student groups. They may be classified tightly or 
be relatively independent of any subject or skill taxonomy 

Any items incorporated m a bank should have been 
tned out on students and found acceptable: they will be 
of proven qualrty. They should also have descr Jive and 
statistical information detailing ceitain important proper- 
ties and characteristics (see Appendix) 

How are item banks developed? 

THE SYSTEMATIC DEVELOPMENT of item banks has 
been attempted in many countries since the late 60s 
^&^n the pioneering work of W(xxl and Skurnik in develop- 
ing a mathematics item bank was undertaken at the Na- 
o 

EKLC 



tional Foundation for Educational Research (NFER) in En- 
gland. Different strategies have been adopted by different 
developers, but experience has shown that the steps in 
the flow chart provide the essential sequence of an item 
bank's development. 

Flow Chart: How to BuM an Item Bank 



T 



{}i turftrutijni and 

• — ^ — 



Item wttttnypiintjUtnq 




For a calibrated bank the following stages wouW be 
substituted in the screened area of the diagram. 



Recalibratfon 




Data lot calibratjon 
\BanK/ 



3^ 



Discard 



Analysis ot fnomtofjru? 
daUi 

, 




Ctoly, a bank of thte kind is npt the work of on© de- 
\«k>por. It does not r^sreaBnt one perKjnls notron of what 
the bank should contain in ternis of content, the tevels of 
cognitkMi to t>e tested, or the nwxte of testing. Instead, the 
Hem coHecllon shouW repre^nt the consensus of know- 
lec^able professionals: practisir^ teachens, curriculum 
specialists, advisers and, for some types of assessment, 
educattonal psychologists. 

What kinds of items should he in 
a banic? 

As THE definitton suggests, any kind of test question 
can be incorporated in an item bank. Multiple-choice 
items have tended to outnumber other types in established 
tjanks primarily because they are objective, require precise 
responses, are simple to modify, are easier to validate, 
and fit readily into most classification systems. They are 
admirably suited to testing various aspects of rnathem=i*ics 
and science, the two subject areas for which most item 
banks ha\^ been de^^loped. However, banks have been 
develt^d for the social sciences, too, and for subject 
areas with practical components such as woodwork and 
homecraft. 

Open-ended questions, like those requiring paragraph 
answers or essays can also l» banked, but marking out- 
lines or gukjes need to accompany them, otherwise 
idiosyncratic interpretations of what constituted an 
•adequate' or 'acceptable' answer would complicate mat- 
ters. Indeed. Wood has suggested that there is no reason 
why tasks, . . such as oral questions, dictation, musical 
passages, project topics, practical experiments, and so 
on, should not be stored, providing some quantitative 
evaluation of them can be made." 

How many items should an item 
banic have? 

THAT DEPENDS on several things: the bank's intended 
purpose{s), who is using the bank, whether the bank 
is computerised or rrot, and similar considerations. Proba- 
bly the best njle is, the more items the better, assuming, 
of course, that all items are of proven quality, are valid in 
tenns of content, and that the classification and retrieval 
systems are not overwhelmed by sheer numbers of items 
so that they fail to operate efficiently. 

Crude gukJelines for the numtjer of items required to 
make up a bank, as '-eported by Prosser. are: 10 items for 
every one that could be used for any one test, and, 50 
items for every hour of classroom instruction on a particular 
topk:. Where t>anks are referenced to stated learning (or 
instructional) objectives a minimum of five items per objec- 
tive is suggested. 

Having a large item bank means that a user is more 
likely to find a suitable match between available items and 
what has been taught, the kind of test, and the levei(s) of 
difficulty requinsd. A well-stocked bank also ensures that 
Items do not become overused, and this to some degree 
gets around the problem of item security. Secunty is impor- 
tant when, for example, item banks are u^ for modera 
tion purposes. 

Where items are to be used in diagnosing learning dif- 
ffculties then many items for each sub-topic or instructional 



c^>^ctlve are needed. But if a bardc is to be tapped for the 
purpose of making a comprehen^ve evaluatton of a pro- 
granr.ne, or a system-, state- or nation-wide survey, then 
fewer itenis on each topic or objective, but a large number 
overall, to assess the many diff^ent learning outcomes, 
would be required; in fact a more general bank is needed 
altogether. 

Where do the items to stock a 
banic come from? 

REGRETTABLY, there are really very few sources of 
sound, high-quality, and content-relevant test items, 
Tw2nty or so commercially published sources are cunently 
available, and several of these are of Australasian origin 
(see list provided). Other sources of items or ideas for 
writing items are the workbooks or manuals that accom- 
pany published textbooks or instmctional materials, mat- 
erials produced by state departments of education, and 
one-off tests found in university theses and diplonna pro- 
jects. 

Obvwusly. using existing item collections is convenient 
and cheap, when «)mpar^ with the cost and effort of 
starting from scratch in developing ones own item bank. 
However, the disadvantage of this approach is exactly the 
same one that turns some teachers off using nationally 
standardized tests: the ointent frwiuently does not m^ 
local curriculum objectives very well, nor do the questtons 
fit local OMitexts. 

Where users decide to write their own items for a bank, 
and thus increase the likelihood that the resulting item 
collection will meet local needs and match cuniculum con- 
tent and emphases more precisely, it is instructive to 
examine already published items from reputable sources 
(including standardized tests) as exemplars and as profit- 
able starting points for ideas and format. But, let us not 
mislead by minimizing the magnitude of the task. Many 
test specialists have commented on the difficulties encoun- 
tered in developing local item banks. 

Paramount is the problem of obtaining enough high qual- 
ity items that are unambiguous measures of curriculum 
objectives (other than knowledge or the recall of factual 
inforniatlon which are assessed relatively easily). Hard- 
pressed teachers working alone, or in small groups, rarely 
have the time, <>ven if they have the skill, to devise the 
hundreds of questions required that meet the recognized 
criteria of "a good item". And. even where questions are 
produced locally, there is no guarantee that the items writ- 
ten by ono teacher will be acceptable to another. 

Let it also be rememtjered that, no matter how sophis- 
ticated the classification and retrieval systems, and despite 
the enlightened application of the latest in electronic 
gadgetry. no bank is better than the individual items that 
make it up. It is at this fundamental level that creative 
energy and expertise must be harnessed. 

Nevertheless, there is nothing to stop local groups of 
enthusiastic teachers working co-operatively over a period 
of time to produce their own Item banns, using their own 
and others' questicns. Teachers involwment and sense 
of ownership and acceptance of the resulting item bank(s) 
is important, and the experience of participating in item- 
writing workshops and in critiqurng the efforts of others 
(commonly called item panelling), will undoubtedly 



bniaden their peit^eption on how specific ^iils and know- 
ie(^ domains may be reliabiy and vs^y assessed^ 
matkwi on ftwi-'Writing arei trial t^ng, ami ach^ice on 
set&ig up and maintaining focai item banks, is readiiy 
avaiiabie, ixAh in Australia and New Zealand, to those 
teach^ who wish to try their hand* 

Mow are items in a banic 
classified? 

As MILLMAN and Arter state: ^Classification the key 
that unkxdis the item bank/ Unie% items in a bank 
can be retrieved qukdcJy and predsely, the system will 
grind to a fialt; frustrated users will shun It. To ensure an 
efficient workat^e system, stored Items must be accom- 
panied by adequate descriptive infonmation. 

Two classification procedures have been fourKi to wwk 
well One. a fixed category appit)ach, classffies items by 
content, often with sub-divistons into topics, subject matter 
or objectives. 



Should item banm be 
computerized? 

AUTCMAATION makes it feasible to ac^Hnplii^ several 
important operations relatively ei^ly. Item statistics 
can be catouiated and recorded reguiariy or at will. Weak 
items, tiYjse that are sekkm us^ by teac^rs can be 
automatically purged. T^ts can be a^mbied by com- 
puter to a teacher^ exact speciflcatrons with considerable 
savings in time, and, the test text will be free of emors. 
Adaptive, or tailored testirHj, even for indh^dual students, 
is readity handled by computer, as is S€X>ring arKl fast 
feedback to both students and teachers via printout. Where 
a bank is calibrated, other ach^ntages accrue (see bek)w). 

While it may appear tt^t a computerized item bank is 
the answer to every busy teacher's prayer for hasste-free 
tilting, there are disadvantages. Paramount is cc»t. His- 
cox warns us: *The vision of a genial purpose K^rrv 
puterized item bank is frequently ^mplisttc or unjustifiabte 
based on liw benefits it will prxxJuce companwi to its cost. 



30000 



-*;L'C' rh<.'^rfer or r:ar.r*<:'t "I i:ri 




A second classification uses a keyword approach to 
identify items. This system is flexible; it can handle items 
which span categories; items can be described in great 
detail and several kinds of information can be classified 
together - subject matter, topic, ability /process, class 
levels(s), item setting, etc. 



-t 



There s simply no evidence that a reasonably priced com- 
puter system can do all the item banking tasks we would 
ask of it/ Another is the stark fact that computers, even 
large and sophisticated machines* cannot handle much 
of the stimulus material that is so much part of ^>und 
testing practice in many subject areas: mathematical and 




^^od^ficatlon of the system to keep pace with change is 
also accomplished simply But. cleariy. it requires a rela- 
tively powerful computer to operate such a system effi- 
ciently and effectively, and this may rule it out for many 
potential item bank users. 

Further details on item descriptors (information and 
O iracteristics) for use in tanking classification systems. 
ERJC y be found in the Appendix^ 



scientific diagrams {more especially those involving 
diagonals and circles), pictures, line illustrations and car- 
loons, graphs, special symlxDis or characters, maps, fac- 
simile texts, and so on. 

Already some banks are available on floppy disk and 
it appears that very 5oon items may t^e able to be accessed 
by TV/phone. Educational Testing Service In Princeton, 
USA, reports that its computer systems staff have auto* 



fmted test devek^mien! procedures. Sudi technotogy wiil, 
of course, becc»7ie avaHat^ to all in the future. As ad- 
vance are made in c(»nputer techrK^ogy mvi goixi item 
bfflikir^ software progmms t)ea>me avaitat^ reason- 
EdEdy priced mbrocomputers, automatir^ item banics at a 
UxeA tevei wiH twcome an incteasingty attracti\« propc^l- 
tton. 



When new items are to be tiiailed they are included with 
alreacfy banked items of known char^^stics in a test 
wtwjh is then administered. On tfie basis of the results, 
the new items' consistency with the bank is evaluated. 
Wl^re judged ^tisf actory, the n&N items can be calibrated 
onto the bank for later use; any misfitting items weeded 
out. 



Should Item banks be calibrated? 

IT DEPENDS on wfK) you ask; it is a 'hot' tqaic amongst 
test d»«l(^}ers and psycfKMneb'kHans. Profx>nents of 
it^ calit^ratkNi, using ^ pedlar R^ch moc^ or soms 
other model deii^^ from item respond tfie(^, will tell 
you that an uncalibrated bank Is next-to-worthless and 
those respor^le for developing it are behind the times 
and sfrictly amateurs. But critics who ate sceptk;al of IRT 
models arKi the apparent reduction of the rich an-ay of 
human aWlities to homc^nous latent traits counter the 
caiiuration enthusiast's claims. They say that there is insuf- 
ftdent evidence that item response theory 'wori<s' and 
teliewB that the latent trait models generated are simplistk;. 
Sp^al a>ncern is voic^ when such theories are applied 
to achievement testing of the kind that concerns teachers 
every ie\^l of the education system. 
To seme extent, whether or not a bank should contain 
calibrated items depends on the user's purpose(s), item 
banks s^-ving classroom teachers' needs exclusiveiy prob- 
ably 60 not need to be calibrated. Items tor class tests 
wouW nwre than likely be selected to meet subject/topic, 
adbility/process. objectives and/or traditional item diffteulty 
^)ecifteatk}ns. Statistical criteria, such as those provided 
for a calibrated item collection, would be of little or no 
concern. But, if the bank is for subject mcxSeration, or for 
efficient adaptive testing, or if it is important to have items 
on a common scale {so that comparisons between stu- 
dents taking different item combinations can t>e managed) 
then calibration is obviously essential. 

An advantage of calibrating that deserves mention is 
the technique of sample-free item analysis. New, untried 
items can be added to a bank without the large-scale 
pre-testing that is required for conventional, uncalibrated 
item banking which the flowchart showed. 



How might an Item bank work? 

AT a furxlamentai levei, an item t^k can be simply a 
iSystematized collection of items fHit together and 
printed in a booklet or kiose-leaf folder, a sort of mail-ofder 
catalogue, whidi is made »/ailable to teachers. The 
te£u:t%r who vrants to use particular itefris for a test just 
copies out what he or she conskJ^ aj^jropriate, has It 
typed up ar^ then reprtxlu{»d in &yme c»nwnient way. 
Better still, whole p^es of suitable items direct from the 
coliectwn are ph(^o<:^>pled to make up tests, thus avoidir^ 
the introductkin of tyjiK^raphicai and similar errors and 
saving the teacher's and typistis valuable time. 

An objectk^n to the^ methods is that the 'capital' of the 
bank passes out of the hands of the organisers. Tlrere is 
no feedbadt to them on bow items are perfonnir^, no 
proviston tor up-datir^ item statistics or for the modification 
of items. Item security might also twcome a problem. In 
fact, such a system is open to atxjse ont^ the item bank 
has t^n disseminated. And, there is no lack of anecdotal 
evkience and documented examples of the misuse of pub- 
lished item banks! 

A more «>phisticated, but also more renwte' af^jroach 
inwives the teacher filling out a standard form specifying 
fairly precisely the kinds of items required f<^ a test. The 
detailed forni, a blueprint for the desired test, is sent to 
the t>ank organisers in some centralised location who re- 
trieve appropriate items from the bank and romplle the 
test. The teacher using the service is provided with a 
master copy of the test for reproductton. 

At a school or local level, a card index system with items 
categorized along several dimensions can t>e operated 
atong similar lines by a teacher or secretary with respon- 
sibility for compiling test 'orders'. 



RJC 



o 
o 
o 

o 
o 

o 

'J 

o 
o 
o 
o 
o 



0000000 



Y 



O J J O O V O O J O 0 O J o 



Is •{vast a «tr«i|^ line. 



^ S^^t rtidrmr tttal R"«f rot* ot rAith i* 



f2, fll, C?. Ufi^ C?>2. f H> 

.... - t^j^^j^ ~- 



* 0 * 0 * .72 ' &,JI* ' 0.71* ,2B ' 0 fli*^4l .4» 



P 

P 

O 
0 

o 

0> 

o 
o 



A cond>lnaSon of these tm> pii)C8dufBS rn^ht have tfie 
tew>»r reJeffing to a master file or catatogu© and spedfy- 
fng fijy code numbers or keywMds) the items arranged m 
a prefened order to make up a test. The coded information 
Is ted Into a conr^ter whfch prints out h^h-quality text 
as a m^ler for cheap reproduction l>y the teadier. 

tf the system for ojmpiKng tests at a centralized offi<» 
fiwn 'controlled' item thanks also has a scoring service, 
then most of the ot)^k)ns to 'uncontrolted' published 
item catalogues can be met. In si«Jh a system it will be 
known which items are being used and by whom, and 
feedback will be available to monitor item performance 
and to up-date item statistics. Such a KJheme, however, 
does require knowledgeable staff to operate and maintain 
the bank and ancillary services, and. it is clearly more 
exp^sive to ain. But. in these days of 'user pays\ it may 
well be considered a viable operation. 

Who would use an item bank and 
for what? 

IT is generally consictered that item banks are jiKJtentially 
useful for dassrcxjm teachers who; (i) wish to assess 
their students" learning uslr^ measures with known 
characteristics, (ii) are willing to examine closely what they 
are teaching and to align their testing with it. (iii) want to 
^ve time without sacrificing the quality of their assess- 
ments, (iv) are able to appreciate the flexibility of item 
banks to meet a variety of testing needs - from indi- 
vidualize t^ts on sir^le sub-topics (to diagnose specific 
learning problems) to end-of-year surveys of achievement 
(in one subject area for hundreds of students) and (v) wish 
to retein control over what is to be tested, how it will be 
tested, and when. 

Othens. such as those responsible for conducting or 
monitoring national examinations, and educational ad- 
ministrators, may also wish to exploit tlie flexibility and 
potential of Item banking. With 'Internal assessment', 're- 
ference tests', 'moderation', and 'school-based achieve- 
ment', being assessment terms bandied atjout today, it is 
not hard to Imagine that an Infonned use of item banking 
might well assist those concerned with competence, com- 
parability, standards and similar weighty malters. The 



thwi^ht is not new. Ihe Schools Council In Britain (now 
disbanded) was contemplating using Item banking to itkkI- 
erate Mode 3 examinations and to improve the GCE and 
CSE examinations away t»ck In the sixties. In 1972, Eltey 
and Livingstone, two NZCER research officers, discussed 
the possibility of item banking as a mett»jd of ntoderalion 
In their publication 'External Examinations and Internal 
Assessments'. In Au^la. ACER began work on item 
banks in the eariy seventies, and Tasmania, since 1972 
through the Hobart Cuniculum Centre under the lewJer- 
ship of Don Palmer, has had centralized Item banks for 
several grade levels In a variety of subjects with clever 
built-in mettiods of self-moderation and emor-analysis. 

What is the future of item 
banking? 

WRITING in 1974. Wbod stated: 'Like fume-free cars 
and the Kingdom of Heaven, question banklrig is 
one of ttwse ideas which has great appeal but which 
people do little about.' He went on to lament the lack of 
progress foltowing the promising stari that had been made 
in England in devetoping mathematws item banks. 

Thirteen years on: what has changed? Briefly, the many 
benefits of item banks - principally their flexibility that 
permits easy adjustment to a variety of instructional/as- 
sessment settings - is slowly being re«)gnisBd by the 
teaching profession; experimentation with latent trait mod- 
els has led to more considered and balanced views of the 
contribution they can make to the item banking enterprise; 
published item banks ha^^ become increj^ingiy availalMe, 
and, despite K)me blatant abuses, have found a niche in 
rrvany teachers' assessment armouries; fears held earlier 
by some teachers that more assessment and a nanowing 
of the curriculu.n to 'measurable outcomes', through the 
relative ease of testing with item banks, have largely been 
dispelled - it just hasn't happened; and the impact of 
computers, of course, cannot be ignored - one can ronfi- 
dently predict exciting developments on this front. 

In summary, it vould be fair to say that item bar^king 
has far from universal acceptance in our schools and other 
«iucational contexts. It has considerable unrealized po- 
tential, and. optimistically, it does have a promising future! 



NotBS 



Mr Nei! Reid is Chief Research Officer: Measurement and Evalu- 
ation, NZCER, Box 3237. WeHington. New Zealami. 

Item caltt>ratk)n involves evaluatirrg the ftt of items to an item 
te^>c»ise theory nxxlel. In the Ras^h model tt consists of estimat- 
ir^ the difficulty parameter value for each item. The great advan- 
tage daimed for this part«:ular proc^une is that estimates of 
item diflicutty are imlepenctont of the particular students and 
other ftems included in the calibnalfon exeicise. 

Appendix: ttem Information 

(Adapted from Issues in Item Banking', utournai of EducatiomJ 
Meaeurefimit, 21 :4« 315-3%, 1984). 
Acojraie irrformation about banked items is essential \o ensu*^ 
the efficient operation of any item bank. Depending on the scale 
and scope of the bank, the following information atout ei^h 
\Vsm should be considered for entry and retrieval purposes. 

Item Description 

1. Identification numbei. sign or symbol. 

2. Contenllext of item. 

3. Keyed answer for objective Items; model answer for parag- 
raf:rfi/essay questions: typical incorrect nesponses for diag- 
nostic test items. 

4. Required associated stimulus matenal (graphs, illustrations, 
diagrams, etc.) 

5. Cioss-reference to other items or to common stimulus ma> 
terial (reading passage, map, diagram, etc ) 

6. ^ility/mental process classtfication. 

7. Keyvvord(s) of item 

8. Author{s) of item. 

9. Source of item (published/commefciaf. Departmental, 
school, etc.) 

10. Revision or version of previous item 

It Question tyj^ < multiple-choice, true-false, essay, etc.) 

12. Type of student directions required for item use. 

13. Curricular importance (essential, highly desirable, desirable, 
etc.) 

14. Appropriate class educational level 

15. Cross reference to syllabus. textt>ooks. teacher's guide, 
manuals, workt>cK)ks, etc ) 

16- Security classification (secure, ijpecified use. unrestrjcled 
use). 

17. Date of item origination. 

18. Pre-testing history {da!e(s) ciass ievcl(s). number of stu- 
dents, etc.) 

19. User comments, suggested modttications 

Rmh characteristics 

1 Ditfk:ulty index*. 

2. Dissemination irKtex 

3. ttem response model tit index (tor Rascn-scaled or other 
IRT calit^ated items j. 

4. Bias index 

5. Readat>ility level index*. 

6. Average time for comptetior) 

7. Ofrtion response frequenaes (pariicularly for diagnostic 
tests), 

8. Information response frequencies {particulaily for diagnostic 
tests) 

' Sometimes judged rather than calculated. In such instances, 
woncts {e.g., hfgh, tow, hard, easy), rather than figures should be 
^ 0d for tfrese estimates. 



I^HiBfwicas 

Btey, W.B. and Livingstone, f .D. (1972) Bxtmrnf Examinations 
and Internal AssB^ments. Wellington: NZCER. 

Htscox. M.D. (1983) 'A Balance Sheet for Educational item Bank- 
ing*. Paper |:»esented at the annual meeting of the NCME. 
Montieal. The quotation is from page 11. 

f^illman, J. and Arter. J.A. (1984) Issues in ItemBanking . Joi/ma/ 
of Educatkmal M&aiutenwnt. 21 :4. 315-330. Tf^ quotation is 
from page aS). 

Prosser. F. (1974) Item Banking in Lippey G. (Ed.) Ck>mfHJter'A8- 
listed Test Canstruct^m, Englewood Cliffs, N.J.: Educational 
Technotogy Publications. 

Wood. R. (1974) ^Question Banking' in Macintosh, H.G. (Ed.) 
Techniques and Probiems of Assessment London: Edward 
ArnoW. The quotaUons are from pages 209 and 208. 

Wood. R. and Skurmk. L.S. ( 1969) item Banking. Stough: NFER. 

ttem Banks available from ACER 

Australian Biology Tast Itam Bank 
ACER ia84 

Volume I. >tear 11; \tolume I!: Year 12 

Areas represent«J in the bank: 

Vblume I - Inwstigating the Living WorW 

The Variety of Life. Organisms and Environments. Reproduc- 
tion, Nutritk>n. De^^elopment and Gnowth. Populations. Interac- 
tion and Change in the Nautral World. The Living World. 

Volume It - The Organism 

Integration and Regulation of fWulticellutar Organisms, Cellu- 
lar Processes, Her^ity. L?fe - Its Continuity and Change. 1 he 
Human Species. Science and the Scienid^c Process. 
Items requiring the 'correct response' and the •jficorrect re- 
sponse' are represented in the Item Bank. 

Austfaltan Chemistry Test Item Bank 

ACER 1982 

Years 11 and 12 

Areas represented in the bank: 

Volume 1 

Atomic Structure Electronic Structure. The Perjodtc Table. 
The Mole and Cfiemica! Formulae. Molecular Compounds, 
Infinite Arrays. Gases. SolutJons. Surfaces. Stoichiometry. 
Head of Reaction. Chemical Equilibrium. Reaction Rates and 
Acids and Bases 
Volume 2 

Redox Heactionj:. ElectrochemicaJ Cells. Electrolysis. Mea- 
surement and Chemical Techniques, Cartoon Chemistry Sili- 
con Chemistry Nitrogen Chemistry Phosphorus Cfiemistry 
Oxygen Chemistry Suit' Chemistry Halogen Cfiemistry and 
MetBfs 

The A/0 Mathematics Items is currently being levised 
The AIB Social Studies Uvma is currently out oi puv\. 

ttem Banks availabie from the New Zealand Depart- 
ment of Education 

Mathematics Levels 1 to 9 French Forms 3 !o b 
Science. Forms 3 to 5 Got man Form 3. Form 5 



Copying Permitted 

S:: Copynght on this item is hold by NJZCtR and ACER who 
grant to all people actively engaged in education the nght to 
wopy It in the interests of better teaching. 



All dassrwm teiichers m^ike quantitative assessments of 
how well their students arv performing and frequently 
must aimbine marks fn^m several different essays, tests, 
exercises or subjtvts to obtain an overall measure of 
achievement. At the simplest level, a teacher may wm- 
bine the several marks for the essays or extended 
answers that make up a formal examination. At a sea^nd 
Ie\'el a sinj^le sciire may Iv acquired to summarise a 
pupiKs performamv over a year's study in one subject. 
For example, after a year of teaching science, a teacher 
may have end-of-term examination marks, and scoa^s on 
assignments, practical exercises, labi^rator)' a^piirts and 
homework available. The teacher may also have meas- 
ures of oral class partidpation and the *ike available for 
inclusion. At a third level, for the purpi^se of awarding 
certificates, for actTediting NZ University Entrance, for 
Queensland's Tertiary Fntrance Score, for giving schwl 
prices or scholarships, a student's overall assi*ssmcnt 
may be a mmbination of his marks in several different 
subjects. The way in which marks are combined may con- 
siderably influence the final assessment; in extreme cases 
it could make more difference than the way the students 
worked, or the way the teacher did the marking. In order 
to be fair to all students it is important to understand the 
factors which interact to affcvt the wmpiniite score. 

The Validity of a Composite Score 

When SCORES from different tasks are combined the sjxv- 
ific information abinit hi>w a student performed at a par- 
ticular task is lost and the compi>site score provides onl v a 
summary of general jx^rformance. For example, the 
teacher may have Helen's marks for a number of tdsks in 
French, such as, knowk*dge of grammar, amversational 
French, French literature, knowledge of French cusli>ms 
and way of and oral and written French, A composite 
score which condenses this informatiim provides an indi- 
cation of her overall achievement, but the actual absolute 
meaning of each of the scoa^s has tn^en ot>scured. She 
may be top in the class in her knowledge of French cus- 
toms and way of life but extremely weak in her oral 
French. In dayto-day teaching, retaining separate 
assessments on the different tasks is often of more value 
than attempting to determine a a>mfH>site secure. The 

ERIC 



mmpi^ite score masks a student's strengths and weak- 
nesses and being in a summarised form it may be rela- 
tively meaningless. For example, Helen's overall mark in 
French which places her among the tt>p thirty percent of 
students in her class disguises her extren«ely poor ability 
to speak French. 

When the overall assessment is a matter of comHning 
scorvH obtaintHi on a numlxT of distinct, but related tasks 
within a a>urse of study, it is reasonable to assume that 
they are measuring attainment in the same area. This Is 
the level two situation ~ like Helen's French. If you re- 
quire a score which summarizes performance in a subfect 
it is quite justifiable to aimbine marks because the com- 
fx^sile sa)n" represents reflated measurements in the 
same discipline. It will in fact tend to be more reliable 
than a single mark. However, when the overall assess- 
ment is a matter of combining sabres from different sub- 
jects the comptwite score that results has an even more 
limited meaning. The subjects may differ greatly in the 
demands they make vr\ the students' knowledge and 
skills and in adding their marks tojrether it is rather like 
'adding four apples to six pears - this can only be done 
by calling them ten pietes of fruit and you no longer 
know what sort they are'. 

There an* iKcasions when teachers are required to pro- 
vide a comprehensive rank order (order of merit) of stu- 
dents, tor example, when aa-rediting New Zealand Uni- 
versity Fntrance, when awarding scholarships and when 
determining who is to get schawl prizes, these awards be- 
ing based on a measuri^ of in erall academic achievement. 
I his must bt* undertaken with considerable caution. 

The essential feature It) recogni/e in combining scores 
is that tht» measurement is essentially rvlathr, not abso* 
Jute. A citmposite score permits us to compare the stand- 
ing of one individual against another, and to make judge- 
ments involving 'more' or 'less', but the real or absolute 
meaning of the scones is lost or masktHl . 

Not All Scores May Be Combined 

A teacher uses a variety of asst^ssment pr^nredures for a 
numtH?r of purpiises. Not all 'bits' of assessment data 
shtmid be considered as candidate's for including in a 
composite score. r>iagnostic tests, tests of mastery and 

/ . 



mformiiJ assi^Nsments ot ntudent pnign^SN are typuMlly 
'furmativo' and arv useful as guides for further instna - 
Hon: the 4i«>M?!Sfiment is on-going. Thwe metisureii should 
not bi'combined.. 'Summ<3ti\ e' assessments, on the other 
hand, pnivide estimates of student achievement M the 
completion of a unit work or at the end of term or end 
of year; they are less frequent and more comprehensi\ e. 
Such assessments an* norm-n*ferenct,»d and student 
attainment is consideriHl in nidation li> thai v\ a pivr s. 
Norm-n'ferenced assi^ssments may he itimbined. 

Determining a Composite Score 

When two or mont* sets ot si^m^s aw to a^mbimui a 
dt»cisi4W must be made aKnit the relative importance t>t 
eai'h test, exam, assessment or task and its desired 
weighting in the composite scorv 

When you have made a decision atnnit whether, say, 
French Vivabularx' or know)' dge of Fa*nch customs is 
more impi>rtant, then, if that decision is not retl(tvtt»d in 
the composite score you do not hax e a isiliil score. 

Validity is a subjective judgement by the teacher, or 
gn>up of teachers as to what weight each comptinent 
shall bt^ given. Firstly, judge the relative importance of 
each task. For example, if a teacher of Fnglish fwls that 
the ability to spt\ik well is more impi^rtant than a know ~ 
ledge of Wordsworth's piH'try, a measure irf a student's 
wnversaliimal skill should have gnviter weight than a 
H^m* given for an ess*iy on 'Or\ Westminster Bridge'. Si»c- 
ondly, carefully examine the 'scope' of eacha>mpi»nent. 
If the scoa* for a mid-year exam (assessing the first half ot 
the year's ivtirk) is to be addi'd to the sivrv on an end-ot- 
year exam (assessing the lull year's work) wliat is the 
ci^mposite score actualh reprt^enting? By simply adding 
the marks together, the first half ot the vear s work mav 
rtnvive greater weight in the composite than the »*t\ond 
half of the year's tvork. It has Invn examined twice. Was 
the early ciuirst^ Wi^rk more important than the later 
work? If it was not, ihv marks must bi^ adjusted. 

The jnif'Hirtanie or weighting of each ci>mpinienl ma\ 
nevd to bt» tempi^red by a ci>nsideratiiM^ o! the relial>ilit\ 
of tile variims scores ti> bv i i^mbmed. Cireater emphasis 
weight should be gi\ en Uy more reliable mea*^ures. In 
general, ri*liahtlitv vviU highest for a pmperlv prepared 
objectix e test, minierate for carofuHv marked assays and 
lowest for tntormal, highly subjeclix e appraisals oi oral 
contributions and participatiim in class. 

Hiese ccmsideratjons make a \aiid Ci»mpt^site su^rc 
possible and the next step is make this piissibifiiv a 
reality. 

Factors to Consider in Combining Scores 

A test which is marked nut of UK> will ntit Ciumt twice as 
much as (will ni»t havi* dinibk* the weight nf) a test 
marked out of r>tl I he marks when addt*d tt»gether will 
simply weight themsi'h es naturallv but not necossarilv in 
the desirt*if way. Marking is essentially a n^lative (rank 
ordering) process and a quite different priuedure tor 
combining marks must {* allowed. 

The essential features ii% innsuler \\hi*n iiMnl'^ining 
Q scon»s are illustrated m the follou mg example 

ERIC 



Student 


1 est 1 






Rank 


liHat 


Rank 


Anni* 






41 1 


t 
1 




I 


Hen 




4 


Vy 






*> 




sS 










% 










4 


7^ 


4 


l\e 




1 








«> 


M.nimuin fi^ssibli* 






w 








Si*ere 














Mean 






24 




m 




Standard IXnintion 


1.4 




i:.s 




8,5 





\oiv: It is thi' teacher s infvntnv} th.il (in tin' compi»site Hore) 
lest 1 wiiiiid ioiinf twio' 43s nnu h .is fi*st 2, 



It can bv swn that the rank order of students in lest 2 
is the reverse of that in Test K yet when the Si"ores are 
totalU*d, the order for the composite scoa^ is the ^^twic as 
tor lest 2 despite the teacher's intention that Test 1 
should count twice as much. In other words, Test 1 has 
had mi influence at all in deciding the final assessment 
(order) of the five students and the same n^sult would 
have txHMi obtained if only Ii>st 2 scoa^ had been used 
and the other marks ignoreil. Obviously such a situatiim 
is unstilisfactory and and needs to be corrtHTted. 

When two or more complete sets of scones an* to be 
ctmibined, the most important factiir which influences 
the etfi\t each will have im the final result is the sprvud 
(standard dex iation) of ^mrkt^ in cadi st'/, not the possible 
maximum score, nor the mean (average) of the marks. 
The spread of stores in Test 2 (SD - 12.8) was approxi- 
mately three times greater than that for Test I (SD - 4.4). 
Therefoas instead of thedesin^d weighting of 2:1, the ac- 
tual weighting was 1:3. That is, Test 2 scores had three 
limes the influence in thecompi>site score than Test 1. In 
general, then, the more the ?7uirks are^i^mul otft, the greater 
/v th\*tr lufluouco UYts^Itt nt titc comfh^sitc. 

¥or a con^posiie mark to reflect what the teacher in- 
tends, the spread of sioresof the M»parale measures must 
be adjusted t<» reflect tht* appropriate a^lationships. AI- 
thinigh a maximum fxwsible scon^ of \{%) pemiits a great- 
er spread than a maximum pi>ssible scont* of it dcH?s 
not automatically folknv that the sets of scores will 
weight themst»!ves appn^priately. In Hxample T for Test 
1 although UK) was the maximum f^>ssible score the 
marks wen* tightly bunched around 55 while tor Test 2, 
the sii?res were more dis^x»rsed, 

Anothi : factiir to consider is the extent to which the 
various ttmipiments are relatt\l (intercorrelated). In gen- 
eral adjustments ti> the spread of a set i^f sctires (to obtain 
appn^priate weighting) is miw iinpt^rtant when the a*la- 
tiimship K'tween component » is low i>r negative. For ex- 
ample, adjustments are verx* important when combining, 
Sii\', marks for science and French, or sci>res on a test on 
valiency and a bii^U^gy dissection. Adjiisfments are le«s 
\ ital u hen ciMiibming sci^res on tasks ivithm a subject. 
1 lench vtHTabiilarx and f rench Pn^se are m4>re ck>sely re- 
lati*d. Si) departures in the natural weighting of compo- 
ni nts f rom their desirect weighting is likely ti* K» less se- 
rituts than when senres fnmi littp'tent subjects ^nv to be 
cimibined. While it would bv unlikely tor a jH'rsim who 
was top in one test in a gi\ en subjivt to bt^ Kittom in 

4S 



another in the same subject area (as in Example 1 whm* 
the two »etj* erf scoa**; are perftHily negatively awlated) 
smaller di*»crepandi»s in rank will frequentiy cKYur and 
are to be expected given the meaMin^ment error pit^wnt 
In e\Tn the mi>Ht reliable of te^t?*. 

I^rocedure for Combining Scores 

When all the students have taken the same series of tests, 
done the same af^signments, or written on the same essay 
topici^ combining thesi» marks is rea^Hinably straightfor- 
ward once a decision hns been made abcnit the weighting 
each score should have. Howe\ er, when students do not 
ail attempt i^-i^mmon tasks, e,g., if they answer optional 
essays in an exam, or lake different wmbinations of sub- 
jects, {as is usual in the senior secondar\' schiml), the 
measure of overall attainment must take into account the 
relative difficulty of each element as well as the variability 
of each set of scoa^s. Is Frt^nch 'harder' than Art? Is one 
optional tissignment more difficult than another? And 
what happt>ns if M>me marks are missing because Harriet 
was sick one day and Henry was moved int4> the class late 
in the year? These problems are taken-up in the following 
sections, 

J. For cases when all students have done the 
same tests, assignments, etc., and done them 
all 

(i) fsiifmh* the !>prm1 of ores on each ffMiti^in', The simp- 
li*st intimate would be the range (the difference Iv- 
tween the highest and the lowest saw), but, as this 
estimate is detemiined by two scoa*s onlv, if just one 
student has done exceptionally well the range is 
quite misleading. The bt*st estimate is the sLindard 
deviation. This index takes into account the spread 
of each scon^ from the mean. Computing a standaai 
di^iation is a lengthy and tedious operation to do 
arithmetically. A calculator with statistical functii^ns 
will make the computation easy. Without a caku^v 
tor a goiHJ approximation to the standard deviatio,^ 
may be obtained in the ftilhnving wav: 

1. Count the number of scores {the number of stu- 
dents who did the lest), 

2. Sum the ttip sixth of siiws. 

3. Sum the biHtom sixth oi Mi^rt*s, 

4. Subtract the sum of the tnitti^m sixth from the 
sum of the top sixth. 

5. Divide this oy half the numlx^r of scores (half the 
number of students). 

In words: Kstimate of SprtMd - 

Sufu of top sixth - sum of k^ttom si.vf/i 

half uumtrr of studcNtfi 

(ii) IWermwc the mturar xveij^ht of mh ^1 of marki., that 
is, the ratio of their score spreads. For example, if the 
standard deviation tor IVst 1 is 4.4 and the standard 
deviation for i t>st 2 is 12.8. the natural weight of each 
comfxment is 4,4 lo 12.8, or appn>x!mately 1 

(ill) Af/;/si etti'h s*'/ of tmnks to obtain the de>\reti wi^hliu^. 
When tMch set of scores is to have equal weighting, 
their spnt»ad of scores should bi* appriiximately tn^ual 

ERLC 



(the same SD for c . h). When the sets of marks are to 
have different weightings, their spread of six>a*s must 
bi» in the Siime ratio as the weights rcquianJ . If an ex- 
amination mark is to be added to a term test mark 
and the examination i« to count twice as much as the 
ti»st, the ratio of the spn?ads of the two sets of swt%^ 
would need to bt» 2:1 {e,g,, eAamination SD 4,4. 
term test SD = 2,2). 

Adjusting the spread of a set of scores is simple: 
multiply or divide each sawv in a set by a constant. 
The spmid of ^xires that a^sults will h? greater or 
less than the original spivad by the factor that was 
used in multiplying or dividing. Thus, multiplying a 
set of scon?s by two will double the spn?ad; dividing 
a set of sabres by two will halve the spn*ad, and so 
on. As all scores in a stH an? treated in the same way, 
their absolute value will change, but their relative 
standing or rank order will be urmlten?d. For this ex- 
eaise, it dtx-s not matter if a test that was once 
marked out of UX1 now gives scores greater than 100; 
the marks aw not being treated as at^solute but simp- 
ly as an indicatiim of which student did better than 
another on that task, 
(iv) Add the ihiju^ti^ ^orcs. This gives you the rank order 
of students which represents a valid summary of stu- 
dent overall attainment. These composite saires 
may be ainverted to fvrcentages, but it is important 
to rememfcK?r that the perwntage j^rore has no more 
absolute meaning than the adjusted scores, but pro- 
vides a more familiar Si»t of figuri«s to make aimpari- 
sons betwet»n students. 



2. Wten some data is missing, e.g., some stu^ 
dents missed some tests, assignments, etc. 

(v) Add theudmsii^i rnvrcs (an iu Step iv\ and obtain theCom- 
\wite Airra^e, Missing marks should not be treated 
as /en? but may be handled by amiputinga comfHisite 
troerajie score: div ide each student's compi^site total 
(the adjusted scores totalled, Step iv) by the number 
of components (tests, assignments, essays, etc.) for 
which there are scores. 

A Worked Lxample: {sve next page) The teacher wants to 
' i>mbine the scon»s on thret* tasks (an examination, a 
practical exercise and an assignment) so that the practical 
fsercise and the assignment are weighed i^|ually and the 
exMminatiim counts twice as much as the other two, 2:1:T 

The f itii^ of the 'natural' weights of the thav elements is 
determinixi from their standard deviations: 4,0 to 8.0 to 
4.2, thai is, approximately, 1:2:1. The required ratio of 
weights, 2: 1 : 1 , may bv nunit readily obtained by multiply- 
ing the exi^mination scores by 2 and by dividing the set of 
sc4>res for tiie practical exenrise by 2. The scores for the in- 
dependent 4*ssignment are unchanged. The thn^e scores 
aa^ then addt d together To take acciyunt of abst»nces the 
composite tittul is averaged to obtain one score which 
reflects the stuo^^nts' pt»rformance in the way you, as the 
teacher, have deu-rmined. 

A comparis4>n a{ the rank order when the unweighted 
marks an* Med ti^ether {J,\M,L,lv,M,c;jl) and the 

^ 50 



Un weigh ted Weigh I eti 





























{ .x»in? 


1 *f 1l 'f 14 .11 

t f <IV If^ >ll 


1^ lit 










l^r ii til- il 




i X ''^ 


( 


ft! h*"i ni't't*! 1 








34 


12 




hH 




*^ 


MI 




Helen 


37 




2 


74 






77. 5 




Ian 


.V 




f> 


H) 




b 


HI 


27 


Jenny 


42 


20 




H4 






41 i 








til 


1 








20 


10 


tvnne 


4{) 




10 






111 










to 




70 




IS 






NoUvn 


32 


2'> 


13 


M 




13 


KM 3 






^5.7 




^>..^ 


71 4 


N,2 


3 








4.0 




4 2 




4.0 


4.2 






Koiv: Keith hvin abM^nf for Ihr 4*\.im 


and li'imv w .in ,i 


l^*nt fur thf»iHwgnmi»nr 







rank order when the scores iireadjusttH.1 and »ippropnare- 
ly weighted and when absenivs ori^ billowed for 
(J,L,M,B,G,IJi,K) shows thut all students extvpt one 
have diffeamt places in class. This illustrates the impor- 
tant of making adjustments to i^btain a v alid nuMsun* of 
weraH achievement. 



3. For cases when students have done diffemtt 
combinations of subjects 

When student«i take i^ptional tasks, such as, optional 
e^say topics, or different st*lcctions ot subjivts. the ratio 
of score spreads aUnie (Step(iii) aKrve) gives no guaran- 
tee that the effective weighting desin^d will rt*sult. Tho 
avcru^c HOfV of t»ach measure also bea>mes important. 
This is KvauH* diffen^nces in ncm's of students taking 
different components may reflivt ditferenivs in the dif- 
ficulty of the tasks, or di^fen^nces in teacher marking 
standards, rather than simply diffen^nces in the ability 
the students. 

(vi) OpfiOfui! iopu'ii xviih tu^ua! uri^ht. One pri>cedure tor 
ensuring all comjxments have equdi weighting is to 
convert each set i^f scores to standard scores, that is, 
scale them t(^ the siime mean and standard devia- 
tion. This pnvedun* has the effect of adducing each 
set of scores to a commiMi scale which will have tn^ual 
weighting when added together. Converting rau 
scores to standard scores may be done either by 
formula or by graph. 

{i) Stamiartiiziu^ Mi?rt\s by u>c of i? formuhh The mean 
(Xji and standard deviation (SDj to which a Si't ot 
raw scores may he staled is flexible. A %^ of hOand 
5D^ of 12 IS frequently suggestt^d as suitable for 
assessment programmes in the schiH>f . I his distribu- 
tion provides scaleti scores normally within the lim- 
it** of 0 and UK). Tables which a^nvert raw siiires to 
this scale have bet^n published (e.g., Qutvnsland 
IX»partment of Education. 1972, pp.62-63) and score 
conversion is thereton* reasc^nably straightlorvvard. 

ERIC 



T-SiWi's with a of 30 and SD^ of 10, are alsi^ fre- 
quently used as an alternative scaled sc>ri* distribu- 
tion. 

Example; To cc>mpute the scaled scim' for each student 
the following information is necessary. 

1. The student's raw scores (Xr) 

2. The h:*Mn of the si*! oi raw scores < X«) 

The standard d* * uition {or estimate) for the set of raw 
sii>res (SPk) 

The scaled score is then computed as: 

Step I . Subtract the mean from the raw score (Xg - %g) 

2. Divide the differeno? by SD^ 

3. Multiply the result o^ {2) by the staled SIX 
4 Add the Result of (3) to the scaled mean (X J 

That is; X. - X. I SD, (Xk - ^k) 

SDk 

If X, and SD, are to be f^O and 12, respectively, then, for 
example, if the mean and standard deviation for a set of 
raw scores is X 25 and SD " 5, a raw score of 35 is ccm- 
\ erted tii a staled store as, 

H)^ 12 (IS 25) - H)^ 12(2)- 84 
5 

and a raw store \ < IN is converted to a staled sciire as: 
\, - W) + I2 (IH - 25) V bt) f 12 ( - Wl - ]b.H 43.2 
5 

It a ct>nversjon table, such as the one Referred fo abo\*e is 
a\ ailable. thcimlv computation required is that in step !. 
The ttifference Him» obtained is then entertHl into the 
appropriate table and the scaleti sci>re RMd tiff, 
(ii) SiiVuUndiziVj^ ^vre> InfUMuia^iraph, A graph is drawn 
with one axis represt»nting the raw scoR*s and the other 
representing the scaled scores. Such a graph can be used 
at a variety of levels of st>phisticatit>n. In its simplest ftirm 
three sa^rt et^uivalents are selecteti to R»pR*sent the 
mean and scores one standard de\ iation above and tme 
standard deviatii>n below the mean for the set of raw 
scores anti the corresponding scaUni scc»r»s. In order for 
all the raw scort*s to be scaled, the thri*e piiints which ctir- 
respi>nd to these thR*e pairs of scores aR* pk>ttt*d. Next, a 

5i 



line Is drawn tha>ugh the thrw poinis; Ihe gwph can 
then be usw! in ajnvt^rt any given raw scorv in .» si4 nt 
scures tu a scaled Hore. 

/ vff/7*/i*. A H-t 111 nnv Hi^n^, w ith a nuMn tit 23 ,md .1 SD t^t 3 tti N' corn «• 
L'd Ui ,1 set 0} MMlcii sciirt's \s ilh ,1 nUMn H) .ind SI) i>i i:!. 



•»k4W!» * 1 S!>, i V ,4N. wi.iiul 72 




Stilled Sti 's 



Again, the .ibsoliife \ aliie oi »d>res w ill .liter luit the rela- 
tive jHisition (rank) of each student a-mains. iinchangeti. 



A WinLii ix,vufUi'. Fssa>'> 1 and 2 .in- i»ptii,n.il, 1 ss»,v .1 is tomyulwry . I ach i-ssa) i^ to have the sanu- weiRhl. 



AKin 2K IS 4^ 

Ik'tty ;i) 



Cath> 32 24 



sn .}-, 4.=^ 



•'^■'^ SI 40.5 

M2 4f, 



24 TS .m 



Harr\' J.^ 7 

Mean M) PvO 2U.M 



4b 7 2b. S 

^1 » St ) 2{ 1 . M 



For till i»Hs»ivH to i^jUiilly wtM^hted jn the comj^^sitt* 
sjctiiv, each set ot *^core^ may bi* converttxi to a common 
distribution (e.g., X. - 60/siX ^ 12) and then added. 
However, btsrause Fssays I and 2 an* i>ptional eom- 
piHients and Fssay ^ is n^mmon to all students, all that is 
necessary for equal weighting is (a) that the spread of 
sctm^ for the three essays be approximately t^quivalent 
and (b) that the mean scores tor Hssays 1 and 2 bt- 
equivalent. This may be achieved by scaling the scores fi^r 
Essays 1 and 2 to, e.g., a of 50 and a SIX I'f (the 
same SD for Essay 3). The adjusttxi scores are then added 
together and averaged to obtain the aimpiisite average 
score. Thus: What amtfHmefits tw tn hiurvqiui} urviht'> but 
not all f^iutieutf^iiothcMmt' tMs, i^SiW>, srr/vivf s. I'/i . , any fkir- 
tkulamtamiiftji'mfik mt4si hair the mvuc mmcrii ul iHilucm the 
apt iivtal tanks. 

(vii) OptmuihvmfhyHvntf^ uilh titfftri'nt uHifihiin^s. Combm 
ht}( score's from diffetcnt suhjirl ami^. It is argued (e.g. , 




Thyne, 1974) that by their ver> nature, optional 
tasks must btutmsiden^d of equivalent imptirtance. 
rhis may be the cast* tiir i^ptional tasks within a 
course tif study in which the tasks art^ judged to be 
of comparable difficulty. However, the Stime 
assumption of com|>arabie difficult)' cannot be ex- 
tended to the situation in wiui h scon»s fwm differ- 
ent subjects are to be combined. While there is 
extensive debate about the practice of combining 
scon>s fn>m different subji»cts, theci>mple.\ity of the 
pn^blem inca-ast»s when not all students study the 
same subjtvts but have different combinati4^ns of 
subjects rs is commmt in the st^nior siVi>ndaiy 
siiiiiol. 

The t\ pica) situation whic h arises is that students 
are to Ix^ compared with each other for some scho- 
Lirship, award 4irct»rlificateand vet thev haw taken 
diffea^nt aimbinations i>f subjet ts. We kmnv that 



different subjwt^^ deniiind difftwnt abilitit^ and so 
wmt* attempt to take amount ol 'qualjl) ' diftcR^mvs 
in the *itudentH is et^ht^nttai it scores in dif ferent sub- 
jccts iia^ til he added . In order to establish compter- 
iil'ilit)^ of tiNlities Ix tuivn diftennit sttbjeii gruups 
some form of niiHicriition is requin^d. 

llie essential feature of nnnieration is lo deter- 
mine the relative level and spread oi abilities of stu- 
dents in diffeanit j^roups. This may bv accomp- 
lished in numlvr i^f ways. An uppwpriatt* refeiem e 
test* may Ix* administerett to all students. I he raw 
SiWi*s for each iprtiup of students may then be si iiled 
to the mean and standard deviativ^n of stores that 
group obtained mi the reference test. As K^foa% the 
mean and Si*i»res one standard deviation above and 
below the mean on each measurt* may Ix* used as 
points for constructing the graph }nim which the 
raw scores may be scak^d. 

In the absi»mv of a common reference test, an alt- 
ernative prLHredure for establishing Ciimparability ot 
abjlitit*s is lo scnitini/e grouf^t of students %vhi» are 
doing the same aimbination i>f subjinis and check 



*E:^ti*nsive doKitc siimiiinds the issiu* appn^priatr a'fi'AMHV 
tests for nnnlcrating pwrfn>«srs. It is ntit pri>pj^tHl i^nlvr into a 
discussion of the \anctv v\ imM>urcs i^»su»Ic. \hv rtsuler is 
rHemit to l!IU*v and 1 is ing^tonc (1^72) fi>r a tn\ilnu'nt i»l fhi*«» 
h^pii . 



that the mean scoa*s aw the same. If, for example, 
the mean sc on* for students studying (ierman is 50 
but the mean sa>r%» tor giH>graphy, Fnglish and 
economics tor the same group of students is 3^, 40 
and 41, respectively, the German marks may be 
scaU*d ti> a mean of appn^vimatety 4(1 

1 hese appnwiches havi'some limitations, particu- 
larly fi>r thoM? subjtH:ts w hich may rely heavily on 
spivial abilities not cmnmon to other subjects, e,g., 
music, technical drawing and art. Tlie students may 
vcrv' able in that partiinilar discipline without be- 
ing equally able in other academic subji»iis. 

/\ Workvii OMVftph'i The following estimati*s of overall 
achievement in each of fiw subjiH^ts an* listiHi (i). An 
appropriate' refen^nce test has txnm administered to all 
students. The mean and standaai deviation {%r SO) fc^r 
each gnwp of students for Fnmch, Maths, histor>\ phys- 
ics and C;erman are (55, Wl (53. 15), (5(1 20). (65, 5) and 
(H), lU), n^spivtively. 

All stores in each subject are scaUni to the appmpriate 
mean and standard dc\ iation and then sumnuHj. Be- 
causi' the students are pnt^simting different numbers of 
sc4iri*s forci>mbinatiiin, an aiVTn^c composite score is then 
computed. In this vvav, dinrisions abinit students' overall 
attainment in relatii^n to that of his peers may bi* made 
lairly and validly. 

















Si a!i\1 *icori**i 








StUiU'nt 




HiM 






1 nMuh Math** 




PhvsKH (ionn.m 






Alan 


70 








40 


hS 




^» 






17!> 








24 




4: 






IS. ^ 












t alhy 






70 










M 








U2.?> 


I>iann4» 




?s 




70 




\2 


77 5 




72 




IM2 


M 


rili»n 






7!^ 














71 




7^ 






^2 




M) 






44 












Gwen 


40 




W 






4w 








1^1) 




SI. 7 


\ hirvk' 






40 










11 




h? 






Mean 






iC 


4S 












* I) 






sn 




2n.r 


\2A 


14. b 


7 1 






20 




10 







This papi'r has attempted to alert the reader !o the hinda- 
mental issues and procedural steps in the combination of 
scores si> that a fair and valid estimate of m erail attain- 
ment for th** individual student may In* achie\ed 

Notes 

Alis4ni Calnnue is a lest IVvelopmenI t>tfiirr vvilh ihe 
ZeaLind C ounnl f«n I ducauonal ReMMr*h 

The quotat}i>n nKiut iipplesand pears is Jrimi {heSi htmls t luin 
cil(UK) lAaminatiim BuHeiin ^2, mentinniul In^ou 

Dt»pariment o\ rduiaU4Mi. (Jueinslaml. Shhiihitn^n \\ithi>} 
StfciK»/s, Research and C unikuhim Bnini h. \^^72 

Dunn S.S. SUutf-uremtnt ami l\Mlmii\^n m the S%imilat\^ Sihi^fi 
ACFK, l**f>7 

ERIC 



riU»v W.n. and I n ingstone I D. / xtandl f Xtwufuitioftsiuui IfHcr- 
? i "-tit, \/t t K Wdlin^fim, 

I MillTtian 1- " f he Assj^nmrnl i^f Sehiuil Marks" in 
(.Jfiinfinnl \.V . (I'd) Af<Ms//nw?.7 i^^ui } lUn'iUion, 

I he MamnHan C imipanv. 1 t>ndtMi. 1*^H. 

i'avnc 11. A Ihi /U^£^^?mvJf «»f Irutmnx. H i - Ik'^th *ind Com- 
pany. MiiSNichuH-tt**. W74, 

f'l^ddle B and VVhilt' (». \t'>tm^ in Vuu tin' Ht»inemann Fduca- 
Jiunal fl«x>ks, AuckLind, 1^^72. 

Sannell V> V. and fracy i'^.B Tv^lvi^ tnui Xh-n^ufnfwnt in ihv 
Ch^<fih*m. Hon^;htim Miiflm C o , f^*sti»n, 1^7^ 

SihiM^lv 1 nuniil K-kaniinatum BulU'Iin ^2, 1 \an'«» \1t»!hufn 
f diualu^nal, 1 ondon. 
Ihyne f M. rnfuiyU'-of I uiftunmiii. Unnersilv i»f LtMiden Press 
I td. I undtur 1^74 



EVALUATING 
WRITING 




" New GUIDE 

T O T H B 



la Five PARTS. 



I. Words, both tomwm mM ^<ffr, 
from Mur to/jf Sjllahln t Tht fe« 
Tcrai (bru of humfSflkMa In tHt 

ed b]r T«bln, iitfo #mfai — 
fiif««, Mid ,^«rr JUcttm* fifr* witii 
fix fltoit XefibM M End irf 
each T«Ur, not cxctedlof tbe «• 
dcr of SyUiibIa In tbt foregoing 
Tib]». Tht ftvfral foru of 
Polyfillmhln life* bring rtii|^ in 
pii^ Tiblcf, bavt tb«!r SyUt- 
blcs diffided, >od DIredbnt pla- 
ced at tbe Head ofmhTablc for 
tbt Acu^, 10 prerait /'dfi Fn- 
waiuimtisa | tMctbtr with tile 
Kbe NomWr or LdSMM on tbe 
fortgBtng T4iM« ptaced at tbt 
End ai eacb Tabte, u fiur ti to 
Wordi of Aar SjrlUbiM, for tbe 
iaJicf and nmt fpcedf W«j of 
teachiog Cbiblfto lo 



CONTAINING, 



II, A hfgff and sfefiil T«ble of 
Words, cbat are the fim# in 
(2^*4/, bat different In Sig^ifita- 
»w«cfljiry i« ptevrat 
tbe Writing one word fur another 
of tht ftmc SmmT. 
IIL A &mrt, but compre^enfivc 
Ortmmr eS the T<.»£ae, 
ddivened ia tbe nofi ^mil ^r hod 
i«ftf«afve Mttbod of 
aad^yWffr I nectffary foi a^j luih 
Pcrfoni as bave tbe AsJv. ataje 
only of an EJncatwn, 
nr. An nfrfot Colk«lwi of Stn- 
tncg$ to Pr^ Bni re>fe^ Divine^ 
Afir#/, and Hifivrisaii tsf^fber 
«}th a fiOed Numbet of FahUt, 
■deni'd whb l^per ScuVtaiei. 
fw tbe better laaprovcmcat of 
the Yoeag Bcgioser. A: d 
y. Fam* of Frajftr for Ouldren 
m fitrcrai Occafi^int. 



The /Ti&o.V, being recommended by feveral Clergymen and , 
eminent St^smstfrn^ as the moft ufcful /'r//i??».vr/r<r^[ 
fof^tly loft rnaion of Tdv/^, isdefigncd for die Ufc cf 
" ^ O I- J> m Grf» MriiaiM and h cUnd. 



The EiOHTH Edition. 



5By THOMAS DILV/ORTH, 

/ .rj^jll; ' A 0 T H o a of the 
$tfeLliiil«i}if Assistant i isASchMlmaJicAvi IVafpir 



PHI LA DELPHI Ji 
Printed a«d Sold by B. FRANKiLIN, IVlDrcxrvn 



ERIC 



5f 



Evaluating Writing 



By David Philips 
NZCER 



Introduction 

Teachers need marking techniques. Plenty are available, but 
which arc the b^t? That dependi^ on what you want ihem for. 
If you want to assess 

1 Normal courhcwork writing — what the children do ever>^ 
day — then there will be two jote* for the marking to do: 

(a) diagnoiEir faults (so that we can give ^ei^nses to 
correct them) 

(b) asartain progress (so we can see if our leachmg is 
successful). 

If you wmi to assess 

2 A year's work — ot even a term's — thai we will he lotik- 
ing for a technique which will: 

(c) assess ti« child's pn^ress comj^ired with his or her 
earber performance 

(d) pwsibly provide a comparison of j^rformance 
gainst tl^ rest of the ctes, or the rest of his or 
her ^e group. 

Choose your assessment technique c rcfuUy — it must fit 
the task (one of the above four), the class level, and the pupil 
*In evaluating writing we arc assessing much xxkw than their 
grasp of a programme: we are evaluating the students 
themselves/ 



Why is Writing Difficult to Assess? 

Despite the excellent re^arch of Janet Emig, Donald Graves 
and others^ the writing process itself is still largely a mystery. 
We know that it is a very cmnplex proc^ requiring the 
flM»teiy of a variety of interrelated skills. Apan from the 
essential inputs of reading and thinking, skills such as 
km>wifl^ how to organize material, awareness of the te»:J^r*s 
goaI» understanding the purpose of the specific writing t«Kk, 
all play an important p$ax in crei^tng writtoi material. It is 
not surprising tluirefore, to find that pupils vary omsiitet^bly 
in their ability m write. While wme pupils improve their 
writing with relative etsc, c^hm ccmsistently find wriring a 
difikuh enterprise. It is important to remember^ though, that 
writing skill develops. It is not a static ability whw:h one either 
has €»r has not. Consequently the end-point reached will 
ii^vitaWy vary froff one perscm to the next. Sina writixi^ 
skilk are usually in a state of change, and fluen<^ taJ^ time to 



develop, it is essential that both di^ostic and endH>f-the- 
year assessments be made with the intention of encouraging 
the buriseoning writer. 

Writing has both *deep' and Nurface' features. The *deep' 
onts include the purpose of the writing, its content and 
structure. The 'surface' ones are the orthographic or 
tfanscriptii>nal aspect>^ of spelling, punctuation, 
capitalization and grammar. It is so easy for leachers to focus 
on the 'surface' features and so easy for i\m pupils to think 
they are the only important aspects that both tcacl^rs and 
pupils may lose sight of the b^ic purpose of the writing. 
C!ollins and Genmcr haw drawnti mtention to this 
phenomenon, and have lal^rlJed it 'downsliding'. 

t>nc grm diftitulty for writers is maintaining conneciwe flow. 
The relationship!! between i<km must be made clear. Yet in imter 
to write about an idea, the idea must be expanded downward into 
pari^aptei, saitenors, wotih and letter*. Sometimes writers — 
pmiculariy children ~ become lost in the process of downward 
expansion «id Ime sight of the high-level relationeihips they 
originally wa«taJ to express. Dmnsliding — the pbenomenoa of 
getting pulled into lower and more k>cal ]cvch of t^k proces»t^ 
— is a vmr comnK^n problem in writing and in other domains as 
well. If a tether empbmizes accuracy in spelling and grammar it 
will reinforce the natural tencfcncy toward down&lidii^. The 
overall result will be ituatt children {ocu% almost delusively on 
tower-level task corapcments wlurn they write. 
Of course, it often very diilkult to avoid emphasizing 
those features of pupils' wriring which are mm clearly in 
eiTor. But it would be unfortunate mdecd if the crror-scekTng 
red pen was not tempered with a sympathetic attempt to 
improve writing skills beyond the merely 'surfkre' 
characteristics. It is not an easy job to mark the Mccp' 
features. But they do have to be assessed if we are to be 
helpful. 



1 Evaluating Performance Dtuing the Year 

Assessing the Developing Writer 

Writers differ m their learning rate and in their potential for 
improvement. However, there is linle point in prejudging a 
pupil's likely achievement in wriring and teaching to that 
expect^ion. Instead, try to pay ctose attention to ove. Jl 
development and focus on specific writing difficulties, 
'Compming a piece in any mode is a complex linguistic, 
experimental, cqeinitive, affective and scribal act.* (Cooper, 
^Measuring Growth in V(lTitms\Hnglisff Journal, Vol. 64, No. 
3, March 1975, p. 112.) Ask yourself: 

Surface probfems 

Has this pupil an adequately l^ble s^Ie of hmidwriting? 

How extensh^e is the pupil's command of language? Is she or 
he having difficulty with spelling, subject- verb agreement, 
sentence structure? H» she or he had sufficient practice 
with this mode of writing? 

Deep problems 

Can this pupD stand back from present circumstances and 
order tlK^ughts in an appropriate manner? Docs he or she 
know how to romper wrinen work? Is choice of content 
(within the piece) or organisauon of the content giving 
problems? 



ERIC 



55 



Has tl» pupil had mough cxpOTcnt^ to write on this topic? 
Is the pupil sufficiently motivated lo write on this topk? Is 

lie or she having difficulticji with pairntn, peers* etc., which 

might affect performance? 

This preliininar>* look may reveal that the student has 
difificulties. If so, steps will have to be taken to provide 
appropriate assistancje. The teacher ui^ing this technique 
inierprm the puptFs writing as part of a complex series of 
m^melattng factors, each an integral pan of his writing 
ability. Further, the pupil's progress is gauged against se%-eral 
variables. A mark in the teacher's markbook which is a simple 
sum of the number of errors the pupil has made, n not nearly 
as useful. 

Revision is an integral part of mmi writing. Therefore 
another important procedure to follow is to allow the student 
to revise aod re-work pan of the writing il* neccssar>% and 
omsuit f^ers, and the tether, an the content and form of the 
written work. (The research of Donald Graves in this area is 
specially instructive. Though it deals with pupils in their 
first few years at school the conclusions arc universal.) This 
procedure allows pupils to view writing ^ a continuous 
process with several mutually supportive stages, nether than 
simply as a rae-off t>'pe of exercise done merely for the 
teacher's benefit. 

Assessing the student's work during the year will entail 
ttese ac^vitics, in this approximate order: 

(1 ) Consider the surface and deep features on page 2. Care- 
fully note the pupil's development or behaviour within 
e£^h area. 

(2) If the content of the pupil's writing seems unrelated to 
the topics given, check the questions you asked, and 
the instructions you gave, for ambiguity. Make sure 
that the taski» .set are within the students' capabili- 
ties, yet challenging. 

(3) Discuss the more immediate difficulties with the 
student; provide a willing ear; be supponive. 

(4) Correct 'surface' errors but by focussing on only one or 
two specific examples each time until the student 
retches an appropriate level of master^' in them, c.g,, 
capital letters for a few days or weeks, then commas. 

(5) Take remedial action where necessary over specific 
thorny problems {i.e., by giving extra iastruction and 
help^ for example, with persistent pror si»elling). 

(6) Keep a careful wrinen record of the student's improve- 
ment in addition to the first 'diagnosis'; update Ji 
regularly (e.g., 3 or 4 tim«i a term). 

Although this strategy requires considerable care, it is 
designed to encourage the student in a positive fashion rather 
thm to inhibit development. Comments on tte pupils' 
wridag, wtetl^r verbal or wrinen, should be selective rather 
than comprehensive. Tliis is so that the pupil can focus on 
separate aspects of performance and gradually bring about 
inq^rovement in them. 

Methods of Marking 

If inaife or grades have to be awarded to pupils* written work, 
bear in mind some of the findings from research on the 



matting of essays- F.ven though mt^t of this research has been 
concerned with secondar}* or tertiar>' level students, it is a 
useful reminder oi" the fallibility ol' the most carefully 
prepared teacher! 

It has been established, for example, that the same piece oi* 
written work ;.ill n^*? always receive the same mark, even 
when markewl by the same marker. The order in which several 
essay>i are assessed may affect the quality of the mark 
awarded; thus a series of good essays nuiy build up the 
marker's expectations so that when a poor essay comes along 
it will obtain a much lower mark than if it had been pxeceded 
by a series of raedttH:Te essays; the reverse also implies. If 
essays are marked over several days, by the last day of 
marking the assessiments are likely to be much consistent 
than thr>' would have been earlier in the piece. However^ this 
is unlikely to be a serious problem when marking occurs in a 
single session, and provided class papers are not always 
marked in the same onkr. If papers ate matted in the same 
order (e.g., alphabetically or by designated groups), the biases 
introduce due to marking (Mtter are likely to be significant* 

A more pre^itig jwblem for the classroom teacher is 
deciding which criD^a ought to applied to any given piece 
of writing. What features should be examined? How 
inadequate does a pupil's performance have to be before some 
kind of assistance bea:>mes neoessaiy? 

(i) Rev^ng Criteria of Marking 

Complete ^recm«it on the nu^t appropriate feature* to 
asse^ does not exist. DifTcrcnt markers give more or less 
weighting to different criteria. For cxampk, two seconc^ 
school English teachers may each have a pupil who insist on 
i^ing an ampersand (&) mstcwi of writing *and' in hi$ e$s^. 
The first teacher may consider this abbreviated technique to 
be a ma^r bre^h of convaition, and mark tte pupil more 
harshly as a result. The secoml tether may well igooK tibe 
ampersands and when handing badt the pupil*s essiy simply 
make a passing referoice to it. Some markers are otmsistcntly 
bothered by spelling mistakes: the attitude seems to be that 
incorrect spelling has to stamped out, so tl:^ red marks will 
fly onto the pupil's essay. AlttHnigh ttese examples may 
appear to be relatively trivial, research has shown that the 
ttjnsistrat breaking of the conventions of spelling and 
punctuatkm cm kad to reduced marls siiKc tlm niunber of 
errors (even though they might be minor oi^) inhibits ti^ 
maiker and also directs his or faer attentkm away from the 
quality of ida^ or ccmtent of the writii^. Many studies, for 
example, have shown that handwriting quality also has an 
itiHuence on the ixiaits awarded to essays so (»T s n:quirad to 
ensure that students with poorer handimting, spcllmg and 
punctuatira do not suffer in their maris as a i^iUt. 

Another probkm, and one pupik often bewail* is marking 
criteria being inconsistently applied. Naturally, teachers 
apply different criteria depending on the aims of a fwticular 
writing exercise. A pieoc of creative writit^ such as a short 
story is likely to be examit^ for its quality ideas, since any 
writing inoxurarles can always be tidied up. After all, 
published writers have ti« service of editors and secretaries. 
On the other hand, a piea of desciiptive writing (e.g., an 
»:a>tmt of a holiday, w tl^ construction of a familiar object) 
is more likely to be assessed on the basis (rfthe accur^ of the 



RJC 



cvects rmmnted or the orderly dbcusston ot the Htcps 
mroived in the activity concem«i. At the j^econdary level 
essays may well be examined for their structural features: how 
well ideas hang together, whether the topic is appropriately 
introduced and covered to an adequate extent, etc. For all 
these types of exercise, the presentation (legibility, 
^propriate kxation of headings and mai^ginn, etc.) and the 
ofthographic features (sf^lling, punctuation, grammatical 
accur«::yO while part of the *tota! communication'* are mu the 
most significant elements m the overall pattern of writing 
ikvelopment. In any asses«?nwrt scheme^ therefore, they 
should not assume undue importance. 

To sum up, the first step is to clarify ihc purptw of the piece 
<rf writing which is to be assessed. Some common purpwes 
(folkjwing Stibbs) are: the recording of inlbrmaiion for the 
writer's own use; retx^rding information for someone else's 
use; helping th^ writer to sort out his own exigence and 
thoughts; helping the writer to understand the experiences of 
others; symbolising experience in particular ways; 
dmribing; instructing; persuading. 

The writing itself may be in any of se\ eral forms (such as 
notes^ summaries, reports, poems, plays, stori^, descriptive 
ficcounts of people, places or objects, letters or lists of 
instructions), so the criteria of assessment will need to be 
ad^ed to suit both \ he form of the writmg and its purpose. A 
set cf instructioits, for example, would n^cd to be well laid- 
out and sequenced accurately for case of interpretation. 
Assessment would, therefore, tend to emphasize those 
features. On the other hand, an essay about a recently read 
book mi^t be s^sessed acrording to how well the writer 
summarises the book's contents and dicusses his or her own 
re^:tion to it. Paragraphing and coherence u^uld also be 
important. 

Teachers must etismr that their pupils know what is going 
to be examined in their written work: for example, that this is 
a descriptive pica and accuracy of information and orderly 
discussion will count highest. Ahhough it is often said that 
writing is a game and a test of one^s ability to guess what the 
teacter wants, this attitude is not a worthy one. Criteria 
should be made explicit, and a careful watch has to be kept to 
make sure that unconscious criteria arc not assuming greater 
importance than stated ones. To this end, markers need to (a) 
exr^^^e thdr ^standards*, through self-atamination; (b) 
ooaimunicate their criteria to tl^ir pupils so that the pupils 
can take them in; (c) keep a careful record of the kinds of 
oofnments they make on rach pupil's *essa>^* and of what they 
have done to assist the pupil's impiwement. 

(ii) Features of Writing 

Some elementary distinctions are useful. 

Mechanics 

The 'surface^ feamres mentioned beibre arc often known as 
writing mechanics, or transcriptional features, since they 
repre^mt those aspects of writii^ which are remiily 
recognized ^ the iHisics of written communication. The>^ 
indite: 

<L Handwriting 

The legihUity of the writing will range from 
Uttinterpretable to absolutely dear and easy tu read. As it is 



usually the first feature of a piece of writing to he noticed 
(except, fvrhap?^, for the ovci^ll layout of the whole 
communication), and creates m impression in the reader's 
mind aK>ui the writer's altitude to his or her task, it is easy 
to be misled by it. Unless the pupil is being assessed on 
handwriting alone, there seems to he Jittlc justificaticm in 
making it pan of any cvaluaiion ol writing quality, 
however hard the temptation to do so might be. 

b, Ptmciuatian 

Inappropriate punctuation (ranging from the occasionally 
omitted comma to inability to distinguish one sentence 
from anotlu^r — Mina Shaughnessy provider some 
excellent example^! of such problems in Emm and 
Hxpectatims) is another immediately recognixablc feature 
of pupil's writing, found as much in unh^ersity students* 
writing, it seems, as in prjmar>» schools. From the marker's 
point of views continually misplaced commiK and/or full- 
stops are a jarring note in any writing (with the exception 
of deliberate experiments with langu^^e as in some forms 
of *cirativc' writing), since they actively imj^de 
comprehension. 

t*. Spelling 

Incorrect sj^lltng is another easily identifiable featiu^ of 
writing, which many markers include as pan of their 
assessment. The nmgc of performance will be from no 
spelling tmstakes to a plethora of errors. As with illegible 
handwriting, spelling mistakes give markcn a hard job as 
the>' tend to countcraa any pofiitive impressions they 
might hold about a piece of writing. 

d, Granmfatical Usage 

Wrong tense, wrong pronoun, inappropriate subject-verb 
^rmnent or other incorrect forms of w^ords can also be 
labelled ^surface* featuixrs since they are easily identified 
and frequently commented upon, but seldom have the 
effect of destroying ideas or logical sequence. 

Seniencf Structure 

This element is often counted as a ^surface' feature, 
including such thin^ as sentence fragments, over use of 
*and\ misrelated clauses, etc. However, many of these 
aspects can be interpreted as punctuation difficulties or 
awkward us^e. 

While these features can easily impede undentanding, atul 
are often refermi to as carelessness, they have ver>' little to do 
with the ccmteni of a picre of writing, unless toother they so 
ol^cure a writer's message that it cannot be understood at all, 
or only with extreme difficuhy. It Ls best not to assess the 
qualit)' of a piece of writing on this basis alone. 

drnient 

The *decp' features, however, are much more difilcult to 
assess, and it is at this point that markers begin to divei^e 
even more widely. Any balanced assessment needs to include 
a careful appraisal of ti^sc ^pects. The problem is not so 
much that markers disagree abcnit the choice of cTiteria but 
that ttey attach different weights to diflfeiwt traits. Ahhough 
this is virtually an insoluble problem the most significant 
*deep* features which ought to be considered in any 
assessment of writing an? listed below without any attempt at 
ranking their imfx)rtance in relation to each other. 



a* Ide as 

This fmua* include^i qualities HUch relevance, accuracy, 
fuUnctiii of tresimmt and originality o( approsch. 
However, it is. often cxircmcly dilficuU to a^NC!^?^ the 
adequacy of a pupiF.s ta^atmcnt of a topic. The negative 
fcatua^s arc often as prominent as the positive: irrelevant 
ide^» inaccurate representations of facts, excet^j^ive 
cmph^is on insignificant poinUi, a confus^cd anitudc 
towards the topic, etc. On the positive side satisfactory 
responses often differ a garat de^ in their treatment of the 
topic; how cB^y it is to give high marks to an essay in which 
the point uf view a^ce^ with your own and to penalin' 
different approaches! It i^ ako important to strike a 
balance between sheer volume of ideas and the quality of 
the ideas — hence the importance for some markers of the 
rather nebulous feature called originalit}*. 

b» Orgimiziiiion 

A survey conducted hy the author in 1979 revealed thai 
university essay markers considered organization of 
material to he the biggest stumbling-block for many 
writers. The development of the ideas: how they are 
structured within the essay, appropriately dividing ideas 
into paragraphs, using contrast, inrriMiudng the main 
features of the topic, putting ideas in an appropriate order 
are all part of this feature. ITie hapha/.ard grouping of ideas 
h likely to be assessed somewhat harshly by many teachers, 
while writing which *flo\vs* will probably be given higher 
mark». Markers should take care to be coiLsisient in 
as^ssing this feature and a)nsider ii' 'flowing' i^; more 
important than having new and powerful ideas. 

e. Ward Choice 

Aspeos of this feature arc the use of appropriate 
terminolog)' (i.e., adapted to the presumed audience of the 
writing;; words which can be readily understate, with 
definitions included when deemed to be necessar)-; the 
avoidance of ambiguity, h^knejed expressions and 
nedundancy; and the use of concise, clear words rather 
than long, obscure ones. Marks must dejxrnd to a certain 
extent on the clarity with which the purpme or context of 
tl^ writing was made clear to the students. 

± Style 

Perhaps the most difitcuh feature to ^sess is the *fla\^our' 
of a piece of writing, i.e., how well the writer sustains his 
anitude or cotnmitni^nt, the suitability of the writing for 
its mtended ptirpose and audience, the use of styli?itic 
devioK and the fluency displayed. Judgements on style are 
most likely to be highly subjeaive. Tlie range of 
prosibilities confronting the writer is very wide, and the 
effects of style on the marker arc subject to influences 
beyond knowing. 

The extent to which these features play a part in the overall 
assessment of the quality of a piece of writing remains a 
mttter for individual tcadhcrs to determine. It is worth 
bearing in mind, however, that even though elaborate 
marking schemes (some of which are discussed in the 
foltowing section) have been devetoped, the problem of 
wtether a particular piece of writing meet^ the criteria or not 
still exists. 



CiiO MariUng Schemes 

One of the hardest ti^fcs m English teacher faoes is deciding 
which asfNects of writing are most important. Fen- example, is 
style iMst importmit, or arc the ideas the writer is putting 
forward motr so? Some of the marking schemes currently in 
use will be briefly covered in this section in order to assist 
thinking about this problem. 

Broadly speaking, there are two types «rf marking sdiones, 
holistic (or impressionistic) and analytic (or atomistic). In 
anal>iic markings a smes of judgements is macte atxiut the 
pupil's writii^ according to a set of clearly specified criteria. 
Marks are awarded for Mch criterion or »say feature 
aax)rding to a prcdctermin^j scale, up to a stated maximutXL 
This is probably the xsmx useful apprcH«± fjr evaluatiiig 
work done during the year, when du^^K^s and appropriate 
^sistance arc mc^t important. Impressk^nistic marking, on 
the other hand, simply requires a single luc^ement about the 
quality of a piece of wriung, and is most useful for end-of* 
yeisu- ^scssments (see later section <m Holistic Markic^). 

Analytic Marking 

As an exraiple of an analytic marking scheme, take a recent 
project undertaken in Canada, which developed criteria for 
the evaluatimi of different modes of writing ftnr grjwfcs (years) 
7 and 8 (Forms 2 and 3). Each aiterion has been elabmat^! to 
make it easy to divide work into the categoric of high, 
medium and low. The introduction inchidcs the comment 
that *we should like to see both teiu:hers and students soisitive 
to the fact that certain writing tasks call for diffcrwit styles, 
different language ctu)ii;^, and anention to particular skills 
each related to the funrtion or purpose of the writing and the 
intended au4^ce\ To illustrate the aiteria, tere is an 
excerpt from Word Choice: 

Imaginative and Varied Language Ctoices: Grade 8 

High: Words and images which provide sharp and con- 
crete pictures for the reader arc frequent. 
Occ^ional experiments in stretching vocabulary 
and images to include ne%^ or unusual words or 

ima^s. 

Trite expressions are usujrily eliminated. 
Flowery excesses ~ too many adiectivcs/adverbs 
piled on top of each other ~ arc avoided. 
Med ium: Generally word and image choice is at a more ordin- 
ar>' tvcl with some experimentation, not ahvays 
Huccessful^ in vocabulary expansion or creation of 
an ims^. 

The student still lacks fuU control and some excess- 
es or redundancy may occur as well as the occi»ionaI 
trite, hackneyc^i expression. 
Low; Litt]e experimentttion with language. 

Reliance on the trite and very ordinary bland or 
abstract expression. 

Occasional errors in the use of standard vocabu- 
lary. 

This publication includes the criteria Organization, Word 
Choice, Conventions and Mechanics, Content/Ideas and 
Style, and also includes criteria related to specific modes in 
writing such as Narrative: Eye-witness account, real or 
imagined; Narrative: Second Person, with emphasis on 
descTiption; Narrative: Third Person, emphasis ondialc^e; 
and ExpK>sition: Presentation of a viewpoint or argument 
(which covers six qualitie>» — planning, argument, style. 



ERIC 



5 c: 



senttnt;^ style, fairness or objectiviry and fresh- 
nr$8/ongmality). However, no criteria are suggested for *free* 
writing, book reviews, reports, etc. It is also su^ested that a 
scoring 5«:ale could he used^ with pupils receiving point*^ for 
each criterion as follov^^: 



Organization 
Language Choice 
Sentence Variety 
Grammar 
SpeUing 



2 4 6 8 10 

2 4 6 8 10 

12 3 4 5 

1 2 3 4 5 

12 3 4 5 



Possible score range: 7-3$» if a composite scon? is thought 
usej^. 

Chic of the most well-known analytic scales is that of Died- 
erich, as discussed in Mcamring Orrnvrh in Bnglisk which 
looks like this: 





Low 




Middle 




High 


General Merit 












Ideas 


2 


4 


6 


8 


10 


Oi^anization 


2 


4 


6 


8 


10 




1 


2 


3 


4 


5 


Flavour 


1 


2 


3 


4 


5 


Mechanics 














1 


2 


3 


4 


*i 


Punctuation 


1 


2 


3 


4 


5 


Spelling 


1 


2 


3 


4 


5 


Handwriting 


1 


2 


3 


4 


5 



Total: 



In addition to the table of points, a general description of 
high, n>cdium and low performance is given lor each 
criterion. Under 'Organization^ for example, h this 
description: 

High: The paper st«rtK at a good point, bi^ a seme o{ move- 
ment, gets somewl^re and then stqf^. The paper has 
an uncterlying plan that the rmier can fdlow; he if^ 
never in ttoubt as to wi^re he is or where he is going. 
Sometimes there is a link twist near the end that make^ 
the paper come out in a way that the re^er does not 
expect, but it sceim quite togicaJ. Main points are 
treated at greatest lei^th or with greatest emphasis, 
others in pxoportkm to their importance. 

Middle: The organization of this paper is standard and conven- 
tional. There is mually a one^fmr^aph introduction^ 
three main points each treated inox^ paragraph, and a 
ccmclusion that often seems tacked on or forced. Some 
trivial points are treated in greater detail than impor- 
tant points, and there is usually some dead wood that 
might better be cm out. 

Low: Th» p^>er starts mywhere and never gct^ anywhere. 
The main points are not clearly separated from one 
another^ and they come in a randbm order — 1» though 
tl^ stiKient hffii not given uny thcmght to what he 
intended to say before he started to write. The paper 
seems to start in one direction^ then another^ then 
another^ until tl^ reader is k^t. 

As an example of a ^surface' characteristic, the descriptions 
finr 'Handwriting Neatness* are is follows: 

Hij^ The handwriting is ckar^ anracbve^ and well sp»:ed, 
and the ruks of manuscript form have been observed. 



Middle: The handwriting is average in lc^ibjlit\' and attractive* 
nesf . There may be a few violations of rules for manu* 
script form if there is evidence of some cnre for the 
appearance of the page. 

Low ; ITie paper i* hJoppy in appearance and difficult to read, 
It may he exiTllent in other resfxvt sand still pet a low 
rating on thin quality. 

What these and j^imilar ^analytic schemes' share is a 
reaM)nab}y e^borate description of those cshay feature 
expected for levels of writing quality. Although a compi^ite 
score can be obtained for any piece of work analysed in this 
w ay, it is not likely to be very useful since pupiU with the same 
mark could varj- greatly in their handling of the individual 
features. Analytic marking, therefore, is most useful in 
classroom assessment when the reasons for the se{:Hiiiins 
marks awarded are clearly explained to the pupiL If the 
application of this technique re^^ealed class-wide deficiencies 
in one or other skill are^, further teaching coukl be orgsiizcd 
to an er these points, as a back-up to the informal teacher- 
student dialf^e conducted throughout the year. 

A single mark or grade made by amalgamating all the 
ana!>ttc scores, lu^wever, is an insuiincient indii^tion to a 
pupil of his writing progress. Written comments would have 
to be added as well, in which a careful evaluation was mai^of 
both the gocKl and inadequate ^pects of the pupil's 
perfortmnce on that ti»k- D^ersch, for ocample, hm shown 
that rAr procedure mth the most consistenfh positwe effect on 
students' mo! wancn is to correct one particular type of error^ 
md to proz'ide a comment on one particular strength in th; 
student *s piece of writing. In this way the comments are mor: 
Hkely to be taken to heart and kept in mind by the student » 
particularly if they ate presented in an encouraging manner. 
A study omducmi by Page showed that students wto receive 
individualized comments from the tether obtain the k^t^t 
scores, compared to students receiving automatic, impersonal 
comments (e.g., *Goch1 Work') or only a mark. The 
relationship between supportive feedback and student 
improvement is a subtle one, and Diederich's advice is 
csfHTcially worth noting. 



2 Evaluating the Year's Perfomiance 

End-of-year grades or marks are not an integral part of the 
learning process. But they do provide an estimate of the 
amount md kind of leamir^ achieved by the student, as their 
main function is usually to distinguish students from each 
other, to provide a comparison or ranking. 

Holistic Marking 

Evaluating writing skills is diiUcult because of the integrated 
nature of a piece of writing and iilTcrenccs in mariners* 
approaches. The ^sessment technique which takes this into 
^count b impnessionistic (or Itolistic) marking. Research has 
shown that a rapid overall judgeiiKnt of the quali^ of a piece 
of writing is m reliable a technique as the much slower method 
of analytic marking* Using this holistic technique, the marker 
reads quickly through each pupil's script in order to assign a 
mark or grack to it on the basis of his or her view of an 
adequate performance. Separate assessments of individual 
features are not m^e. 



ERLC 



5S 



As a oil the CDmisnmcy of tte marking, €$wys can be 
sorted into three approximately cqqal piks representing 
good, avrr^ and poor efforts, with each of these piles being 
sorted apdn into three pileiu, making nine in all Thus Chsays 
in pile 3 can be compared with pile 4, etc,, to emure that (a> 
there are differences in quality between each of the 
neighbouring piles and (b) essa>^ within each pile are of 
similar quality. With practice this cbeckit^ prcKres^s can ako 
be completed relatively quickly. 

When a team of markers is invohrcd in thi?^ activity', checks 
are required to ensure that all the markers have romparabk 
^tandmli^. Normally this is dMie rwice: once before any 
assessments are made so that everyone involved knows what 
is being looked for (that is, the criteria of ^equate 
performance) and, secondly, after the ffiisessments have been 
made in order to check for any large inter-marker differences, 
Tbc range <rf marks awarded by each mfflrker needs to be 
eicamined uxx Obvk>usiyj some m£U*kers are more harsh in 
their judgeui^nts than others, and may use a more restricted 
range of nu^ks in which, for exampfc, the high ones tend to 
avoided exrept perhaps for an outstaiuling respcmse. Others 
will be more knient^ and may fail only students with 
excessively poor answers. Some bunch their marks aroimd 
the middle, Conseqi«rntly, it is necessary to be very clear 
abcnat xhst characteristics expected of answers at rach point of 
a s^^te, and to ensure that each marker agrees with them prior 
tt> iwsessmrat. Even then differences will probably occur But 
although personal him€$ can never be completely removed, 
working closely with other teachers will ^sist the proems of 
ironing out both foreseeable difficulties and any systematic 
bias due to identifiable idiosyncrasies. 

What other sources of variation can be guarded against? 
The questions smdents are required to answer need to be 
devfeed very carefully. Rosen, for exampk, has slHjwn that in 
a list of essays, from which a pupil is required to choose only 
<M^, different essays may make very different linguistic, 
content and organizatk>naI demands. It has also been shown 
that students, wi^n given a choice of questions, do not 
necessarily answer the ones they can obtain their best marks 
OIL Ambiguit>' in question phrasing has to be guarded 
against, too, as smne pupils may interpret their ti^ks quite 
differently from other pupils when confronted with the same 
essay questbn, and do badly. 

With especially important cMCMjatiims, it is sometimes a 
healthy practice to use more than one marker. This roJuc^ 
personal hm md, where a pupil h^ interpreted a question in 
an unusual fashkm» for example, provides an alternative 
opinicm of the quality of the pupirs writing- Multiple 
matking of tl» san^ jmpm is generally preferable to a single 
rating and does not take a long time when the impressionistic 
technique is used. It ako rcsuhs in greater consistency 
between markers in their assessments. 



3 Performing an Evaluation: A Cliecklist 

Ccmsider these points carefully; 

I Wky are you making the evaluation? 

Remember that initial assessments serve a different 



ERLC 



function from thc^ made during the year, and espec- 
ially fmm thc^ which attempt to sum up a whole year's 
work. For example, is your evaluation designed to 
provide aii overall judgement of a pupifs writing 
ahilit)*? If so, kicaily it will be based on a nmge of want- 
ing tasks, as one task ilone is hardly representative. 

II What do you hc^ the outcome will be? 

The way the information obtained will be used is 
probably more im|:^:)rtant than the method adorned Is 
it mainly to help your students hnprove their writing 
skills, to widen your knowledge of their abilitio, or to 
provide a means for comparing students with ei^rli 
other? 

in Chtx)sing appropriate techniques: 

a. To obutin a deeper understanding of your pupils' 
writmg ability, ^k yourself the questions listed 
on p^ 2, 

b. To assist pupils to improve their writing, follcyw 
the prcKredures listed on 3. 

c. When a mark is required on a piece of writii^ dcme 
during the year, work carefully from a set of expli*- 
cit criteria* The f^mres listed on pp. 4-5 will 
assist here, though they will have to be adapted for 
different cl^s kvels. Tlic marking schemes on 
p^e 6 may also be useful. 

d. RenoKTmbcr that positive written comments are 
required as well as marte. These should be record- 
ed in tte markbook too. 

e. When ^sessing cndn^f-year work, be very clear 
about the criteria students are expected to meet 
(i.e., the characteristics of an adequate amwer) 
and conscientiously try to aN'oid potential source 
of inomsistency. 

r When pBsi of a team of markers, work together 
both before and after your marking to reoKwe 
idiosyncr^ies due to different 'standards'. 

g. Multiple marking of the same f^pers is a sound 
practice for especially iraportimt exams or assign- 
ments. 

IV Some pitfalls to avoid: 

a. Try not to focus solely on the 'mechanics' of writ- 
ing. Excessive conrction of pupils' writt«i work 
is unlikely to induce bener writing, 

b. Inhere is no need to »sess everything that is written 
in the cli^sroom. Formally evaluate only work 
considered by the student to be a finished eflfcxrt. 
Allow students to revi^, especially their course- 
work, 

c. Do not mystify students by adopting mvkmg 
'standards' unknown to your pupils. Make your 
ex}:^ctationh known; make tl^m rei^mable! 

d. Make sure questions and topics are not ambiguous; 
if they are, make allowance for this in your evalua- 
tiont>. 



Notes 



Thcr qufrtatmn in ihc iniriKiuction is 1mm the Oniario Minjstrv vi' 
BducBtkm.HpaluafimanJ thv HniiitshJMff^f^ 197*^, p. 15. 

Teackifig imi tl^ Writing Prwess 
Resetfdi m tbc writing proctss includes: 

Caiqw? CR. md OdeiU L. (eds.) Remtrch m Cimpomg: Points 

ofDeparwre, N.CXE.; lUmois, 1978. 
Enrig, J. rAp Comfmsh^ Pwctsses of T^'^lfth Uradm, N.c:.T.E. 

lUscarch Repon, 13; IlHinris, 1971. 
Ckam« Dil. *An Exmnm^HTn of the Writing PnKejksa^ of Sev^ 

Y tBt0UO3Mttn\ Research in ih€ Tea<Mtg of English, 9,\ 197% 

pp. 227-241. 

D.H. Balance the Bastcji^ei Them Write, Fmd Foundation: 
New York, 1978, 

CoBm and Gemoer's study, * A Fmnww>rit fw a Ixjgniitvc Theor}' 

<^ Writing' cm \x found in: 
Qttggf L,W. and Steinberg, E.R, Cagnim^ Processes m Wnrin^, 

Udbmm: New Jmcy, 1979. 

Son]» tssdul rrferrnces on assisting the developing writer m*: 
HiUoridbi, RX. *I>C¥ek^^ Written Expiession: How w Raise- 

iwt R»ze-Writei^% Liw^?ito^^i4«s, %,7, October 1979, pp. 

769^77?. 

Stft^bs, A. AsK^mg Children's Languc^e, Ward Lock Eduottional: 
Londm, 1979. 

Thornton, G. Teaching Writing: The Development of Wrinai 
Lof^tage SMlh Edward Arnold: London, 1 

The intpcHtanoe of revision as iwt erf the writing process is dis- 
cuj^ in: 

Cattuns, L.M. 'Childmi's Rewritmg Strm^ics\ Research in the 

Tmching ef English^ 14,4, December, 1«K), 
Gram, D.H. *Wh« OiiWren Show Us Abmit Revision*, Research 

Uf^iate, Liwjp^ -^rrj;. 56,3, March, 1979, 

While Mina Shaughnmy discusses the kinds of mistakes made by 
first year Coltefc students in New Yortt, many of her obf^ervations 
reoratmendatimis are particularly iseful for teiK:J}ers of all levels 
in New Zeaiami and Austr^. They are presented in: 
Sfaauf^mes^, MJP. Errors and Exp&aatiam: A Guide far the Teacfmr 
cf Basic Wririr^, Oxford Univcnwty Press: New York, 1977. 

Wkm Inftumces tl^ Ammiing of Marks? 

Researdi on thfc topic is ottensive, particulariy on the reliabilit>' of 

may nwken. The references given here are a tiny selection only. 



The eflccts of •surfjaK:e* features m markers, for example, can be 
found in: 

Briggs, n. The Inlluence of Handwriting tm AsH's?^mcnt\ tJdHca^ 

lional Rixeanh, H, 1970, pp. ^CVS^, 
Marshall, J.C. and Po^m, J.M. 'VS'riting Xeaim^H, (Composition 

Em^rs, and Ks^iay (tfaik-H\ Imrnu! i}f HducMumiil ^SUastm^nvnt, 

% 1969, pp, 97-101. 

For the clTecls of ditTcrcnt marking; criteria, sec these carlv jiiiudies: 
nkderich. P., French. J.W. and Clarlttm, S. 'Factoid in Judgements 

of Writing Ahilit>'\ K.T.S, Research Bulhiin. 61.65; Princeton, 

N.J,. \m. 

Rcraondino, C. *A Factorial AnalyKLs o! the Evaluation of SclKilaitJC 
(Compositions in the Mother Tiniguv% Brithh Journal of liduca-- 
timal Psychology ^ 1959, pp. 242-251, 

A useful summar>* of inter-markcr and intra-markcr reliability (i.e., 
marking dilTerena^ in the ^^e pennon), with special reference to 
cssav*s is: 

C'^ie, Colia 'Using the Rjsav as an A?>soismcnt Technique'-, ^et 77. 
no. 1,NZCER, 1977. 

Assessment Techniques 

(i) The analytic marking schemes described can be found in: 
Diederich, P. Measuring Onm^th m English, N.C T.E.: Illinois, 
1974. 

Evans, P. J., Biwn, P. and Marsh, M. Criteria for t^ EraluatiM 
of Student Writing Grades 7 8, A Hmdinwk, O.LS.E., 1977. 
(it) For holistic mariring see: 

Cooper, C.R. *HoUstic Evahiation of Writii^' iniwahatmg Writ- 

mg: DescriHng, Measuris^, J^iging, edited by C,R. Cooper md 

L. Odell, N.CT.E-: Illinois, 1977. 
Gitenhalgh, C «jd Townscnd, D. *Evaluating Students' Writing 

Holistically — An Akematiw Approach', Langjfoge Arts, 58,7, 

October 1^1, pp. 811^. 

(iii) A standard refemicc for tcarhers interested in essays as an 
examination technique is: 

CoiTman, W.E, 'Essay E&aminatbns* in Educational Measuremem^ 
edited by RX- Thomdil^, American Council on Education: 
Wa&hingtiHi, 2nd ed, 1971. 

(iv) The imponance of -xwumcnts tmrhers make is discussed by; 
Seark, D. and Dillon, D. *Respwding to Student Writing: ^at is 

Said or How it k Sr'Hi\ Lmguof^ Arts. 57,7, October I960, pp. 
773^781. 

Wade, B. 'Responses toWritten Wtwk: The Pmsibilities of Utilizing 
Pupib* Perceptions*, Educational Review^ 30,2, 19^, pp. 149- 
158. 

Waie also cites the findmp of P^ and Rc»ien. 



8 fii 



er|c 6.; 



Observation: 

The Basic Techniques 



Bnico McMillm md Anno MiMdt* 



Intrvkhulum 

Xliisl nj whjl \vt' kihn\ jl^uil i InlJu n iinni *. in'ni hm^; i ,irrU)!lv u'hjl 

minds wuh Hu- IjIj'sI laii*^ tiu'ir i^iUj-^nnj^. s bH-v^M^u-ni^-. \\)^\ j*. 
{m|um{lv ilisi i:ss ».tMnt jMol^jt m . i>t!rn ti^ktn ' a sjn^pK- 4nj»»slu>n siu h 
js. tvhrn tiui 1 :.w i iuki br^jn to \v,ilk? . n itu'v »sit»p ^uikin^.: 
their thumbs? 

1 1M4 ht r*^ wHrn ,)sk '^"^itn' o^inj^li \ i|ur-tuMiv b .js bin\ i hihir^-n oJ 
«i iert^itn tan b^- rxjHuii'ii to jnirrji f with f^nh olln f i»r btnv thry 
\i\irn dithtuil loiHtp!- ( jii^klnK wilh o{jnM t \|H'r n-nuii ptoph^ hflps. 
J\e.Hlin>». Uxibiioko iu lp^ Bui u.iiibjn.; tluUii'4' ) boih morv /nU^i'Knny, 

t^bst'i vrnv. tHul oniniy, ui^af i bikimi Jo MMJini^ .1 simpk- ]>rocr^s. 
*3nd most i\\ ihi linu' if is. But vvlu n lh<' obMTX .ilmn*^ hiivv fo iJMni 
ior *in iinpiuUHit pu?pi>st» hnd lluf dHfi-rrni prof^lt m'i' diMt'rt'nt lhin^>. 
This is t|utlr Usual I hf?^k ol j i .jr ,u i ultnt *in;i fhc ^-viJi PM- rn in 1 4iyr{ 
- it si'ltlofn hiiljts t \.i< tls' .jHil il i! litd iht'if wouk) iv suspii ion ill ciiihisnm 
amoni^st ihv witni'ssi-s! Similariy SiMm lj>;h!in>', b< lurt'n !wi^ sniiill childtvn 
will bv stTH tjiUiTi»ndv by ihi^ two nn>Jht*fs by \hv prrM hn<tl stjpiTvjsnr^ 
and by tbt* i hikirrn. 

WIuj! yjws Us inn t>un i^n uliiir *uul tbifchnt' suini'whji untdiabit* virw 
ot ivh,!! h»!ppt'n*^T bxpri i.jtii^ns an- .i titrnnnm t^KtatU* to >;otHl 
tibsi rvalioT^. ]uHi htrausi> David wa^ in .t liyMl yrsk-iday tk^'sn ! mran 
hv musi have ^taried thr one Joday, Injl it mi>:h! mak<' us int limal io think 
so. Thvrv atv pk-nty of t»thvr i^bst^u hs. Scicntitit ohviTvatic^n has to bt- 
dt'lihuratr and systrmatit . larrktl oiii vvilh care and prnivr prt'paralion. 



ERLC 



The Usi?§ of Observation 



ObH*fving ihiltiron i^jri'hiliy and systi^inalii ally rn*?i^lrs us to hn-nnJ 
KtKNswork or jsMnnjMion. or bi.iv. f?iit wv nvii] a variety o\ ivihmqwy. 
Viuh its own ran^;v ol ikcs. Sopii- of \hv ux-^ lUc sUKJuv^ti'd hctv 

1, I\V i'/'Nrnnj/it>?i to lic^i'f ilH' tlir hi^hUi^iiHu iV rJhvarti'nt^tus 

Many statements abiuit i fiildn-n art- sinij^liNiir ^^rncnslisaiitn-jv vvhii-h just 
|nit a labi'i mi ailiihi, ihr\ iJn in-^l iji^sttihr him or iun. Mary is ttrribly 
shy in MiihaH js hyjH^ratliw arv lAanipk^. I'hr labels shy or 
'hypcrat livt' It^ll us wiy little^ abttut the ihikl. rhvy say niithin^ of the 
^kjood Ihinx^s ihry mav clo; nothing abmil tlu' t ire unistancvs in whith Mary 
niav shv i^r niav bt' i^uifr baj^jn' to inttnait with i^ihcrs; nothing; about 
the i.sny',r ol rK tiviJu's Mii fiat i iKn^s on^vj^e in. or hosx lon^i; he spvniis at 
iium. VVln*T\ wv have t arrird out sy^trmatit observations ot aehild. tlum 
we arv t^ntitli*ti U> draw the evident e ti^^ether. and sn , tiu example. 'Mary 
spc^nds oniv a snia)! part ol her time af j^resch^^o! playing; lo-operatively 
with others, anti tends to elsewhere it ihere are more than two children 
pkn in^ where she has been , Or wt* may i one liuii* that Miihael stays at 
an aifivity for an avera>.e ot two minutes only, ami siltiiMn talks with 
an atUilt whik- he is playing' !n both these lases, observation aiknvs us 
lo desiribe the child/en inort' aiturately, and su>^;ests asjXM ls oi their 
behaviiHH wl^ith ct>uki be attended to more care) ully. 

2, l\V iiUi li^c ol^^i^nuidi'fi ti^ nuuiiior a rJiilti s Ji^u^JojUficnt. 

Relatives or if ienils whi^ si'e i hildtei^ only at irre^udar intervals Ireiiuently 
commi nt on how miK h ihi'y have ^rown. or i hanv,ed. Hut thost* who are 
in constant content with the ^.hiUi <tm fiiii ft^ st>e' suth developments, for 
the\ i^lten involve slow processes ot j^hysiia! f',rowth, or the acquisition 
ot MH ial skills, or dev<»k^ping thmkmg ai^ilities. It is a relatively simple 
matter to measure a child's height every nitnith or so. It t^jkes rather morv 
skill to observe and record other developmental pri^>;ress. Hut it can 
done, and <an prtnide jmpi^rtant intormalitm to thi»se who are 
resfMmstble for helping', this proj^ress 

3, Wi' t ivi use chM'*rViitioft to i xivtutw cliihhcn intcnictin^. 

Sometimes we mvd to kninv about the K^oup o< c hildren, rather than about 
any partii tdar one in the group. Hi>w tkn**. the >?roup cUvide what to play? 
How ck>cN the ^^roup develop an idea ^o that the whole nature ol the play, 



or other «Htivity, i hon^u'v? Hmv Jors thv ^roiip sin alnnif inc hulinjt; «m 
deluding, some pat firul.ir child? When ilocv a trirntllN ussii' bin i^mo «)n 
angiy tight? In all thi*M* castas wv mxd thv ski!) ot Inokinn cau>!iiih M M 
that is j;ojn>; on. uilhvr than hHu^in}; only on onv or Uvii ihildn'n. Vhv 
n*hull*i of Mich tarrlul ohM^rvati^ms /asilv juNiilv Uu* liinr ^pen^, 
t'merHi' with a much clrarrr piittiri' uliai is happtMnn^',. 

We can w^c ohscnuitiivi to i\u^?n>;r partH uUn p/iiv or Icavnius^, 

SoniHiniis. we nerd look M an at iivify a p.irtJcnlai sitUtiliiMu lather 
lhan any one Uiild or fvoiip ol rhildrrn. In a pri>ihoi»! toe i^xampit* wk 
may find some people timiplainin^; ih.U one ivuner is neViT useii, or 
another one is ahvays lett uniiJy \\ only too eas\' to U\ ihe blame lor 
such circumstances on to the ihings vvhkh immtdjalely lake the e>e. A 
moreeliiX-Hve prineduje is {o obstn^ e larefulh . to? s*^me iime. io rhetk 
e!»catlly ivhal dot^ happen, \vh<> <omv^ {o j^lav J here, anii the H'qiu'Tue 
of events as ihey plav 

5- can u>i' oh<crrattou ti^^ chiU-k act vrit it's pu\i;iriini}}ic> or 

It is very important thai when thanKt^s a?i' inlrodutc^d Iheie some sww 
oi monilinmy, the etietts ihey have, it is appfi^pruite !o have a s.-J uj 
observations More and alter, vt perhaf^s dunn>; the 4 han^e. When a ne^v 
item vt play equipment is intnnluteiJ. In; exampfe. iioes if lake children 
away Jr«>ni other, equallv sahMhlr, aUi\ Jties? If so. how krn^ divs the 
cttect last? C)r. when the liyoul i?t a centre is i hanml do i haldren Iny^m 
lo mi>ve ditterently between activUies? they spemi mon' or less time 
on some v\ them? Only when tiiiestions sui h as thi^e have b<»en answered, 
do we have the mhirmatii^n on whu h to is!*^ ' a judKemenl. and cont hide 
that this equipment is better th m thai or this' arran^tement better than 
'that . 



When /;M/enfs touchcrai uftii romifrlw<i mo ^cttui^ nut to tnulcr^tmui 
chUiircft ami the onvhowfwnt in xrhich tJuy livoc^rvtnl, systematu 
observation 15 tho n;i)s/ iniportaut tool u^hich they can utk\ It wvohcf^ 
phfvmij^, not jmt casiuu nhseroatkvt. It ini-'olve^ carehd thought 
about the purpn^et^ for which the oh^vTatinm^ are to he iL<ed. The 
marc cotufilex the p//rpos4's. the more time and effort /> requiml in 
plmwinfi and carrying out the ohf^er'oatiom^. 



5 

ERiC 



Observation Techniques 

J. Diivy dchcnjUinu 

IhiuJury dc^riplion is a f.nfly inliMni.ji *u< ount oi ^oim* i^^piVtN oi i'ne 
c-hiUrsiit'vrlopynfnl. Thv rvtonivr usu.illy nviki> Hi^iiN oi .my tvi nls whith 
h«)ppi^n to intm^sl him hi f . ovi r a pt^runl r.uenial priJv m.n ^'luour.i};^ 
us to rrttMil our cImU ^ \itsl }an>;ua^;c. fiiolo? skills suili js walking;, 
lof iAainplt' (.)r a Uvtihrr may niakv a i4*w qiih k niUt^ alnnit a i hiliis 
lirsi ilay al jmvsiIuh^K i»r sihiii^I. anii iHia*^u>na^ Jays iA^Houiii^t; ihaf. Il 
proKihiv moans that ihv^e oiiasi<ifKi! diarv rniricH arr made whcnfvrr 
the rj^ht niooii h.ipfM-ns to strike' tiii* i^bsi-rvcr, rathor than <vi any 
sy^ti'inalH basis. BrtauM.' this, tht- diaiy nnAVij tan hv quiJt* inailrquatc 
as a stnuul drsiiiplinn ihr t hihk or any spiHiiic asptvls of his or hvr 
ilcvdopnunf . Thv ohst rvi f ^afi lu^ hiasini Um arils ivrl^un kiml i^t 
situalion, or pvrh.?j's scltx! only ^omv parliiular lhtn>:s \o ri'tord. This 
nuan?' that usualiv no yjvM u^v iwn bi- rinuie oj thi^ obst^rvation ri'ioiiis. 

\Wi»r!hdt*ss. diary iWi iplJons iit^ havr thrn usi's I hr obstrvrr k very 
inlt*f<*sti>d in spmi' aripct fs o! ihi- i hilii s bi i^avunar. anil prepared !o lakii» 
thr ijnir in nolt' down impressjiuis, Kvi'n bopnnin\; sm h <i sjnipk> n\ oniin^; 
can hi'ip to sharpm ihv ob^i-rvvt n auaicru'^^s of wl^a! is ^:oing im. That 
can, in lurn. Ii'ad to ihi' rralisatiijn that dilfrji-n! tyix's oi inlrrt^stinj^ 
lH*havitnir art* «if>ptMnng, or lhi\ ihiiv m.^y he roLitiofiships ht tmrn some 
i*f the* thin» ih.)t havr happoni'ij. 

J'or vNamplo. thr parents tJiarivs inii'Espi^rsi'd ihrou^'h Marv;crv 
RtnwKks TobihiH^lAtFrrr (UVilinj^tiUi, X/CHR. iHustratr how 

su<.h rtTi^rdsian ta]Murr < haruuv Thr « ' ildri^n \\v!v rxjvriimrinK major 
liii' fransition and thi* diarv drsc nptuMi ^ suy,>^i'Kt thiM ontt'xt^ that with 
M'lativrly smiH^ih trHnsitiinis 

Ihus. diary dose r ipiio{)*- lan br tht* sprni>;{n>.ird to lurlhvr. mow 
systvmatii olwrvaliiHi«s Yht^st' t^bst^rvaiit^n^. will be spt-ciiit ally ainu»d at 
Hndin^ out tht* answers \o thv 4|urs!ions that ar.^sc irom whatcvvr we 
happened to not iie. 

I )iary desrripi)i*ns uniUl i^bviouslv bv helpiil by photographs; the 
family phi>ti» album can be a ri\4>ril ot the iievek^pment oJ thiWren in 
a lamily^ Fihn <an have the same use. Hu! written reti^rds are most 
common, and havi' (he advanla>;e that tnily jn^n, paper and a Httk^ !hou>;h! 
are necessary C harles Darwin observed ..nd recortkni the behaviour of 
his son in this way hvause his siienfiJie baek>:round helped him to S4f 
the value ot eareJu! description. 



6 



Hm» an* Wfric niort* examplts. The first two are vnlm^s f mm a dwry 
l^pt by a mother who was particularly inttwstcd in language, ami who 
simpiy wanted to keep a reaird of her chiUl's progri^s in this area. 

B'4i day was at an enU. Bathed and ted he sat wrapped in a shawl on 
my lap to have his ewnin^ talk' with me. All at once he looktd intently 
at the vrall at my back. The evening sun lay on it in a broad golden 
Kind mirmring tlie windtw and littimi by the black shadows o{ moving; 
leaves. Intently he watched U a)L then iooked up at me with a smile; 
he uttereiJ a delicate S4>und, and \ooU%\ bark. With that si>und he spoke 
to nic oi all the U>veliness hv was seeing, .ind wanted to know \^*hether 
I f*aw it loo. 

February 1st; A good %vhi)e ago. B used lo bring me his shoes when 
1 pointed to them and s^iid 'bring me the shoes!' Tliis and similar 
wcura»nct"s might ha\T suggested thai he unden.tnod my ulterancv. But 
it is quite possible that rny pointing lo t]w shiHs and saying something 
or other was sutticient suggi^t it him. But now. B definitely 
understands whole phrasi^s. When 1 **a>* to him 'we will go to the 
bathrmnn; up he gets, and goes the door in i>rder to patter along 
the han'-wa%- antl play with the empty shampoo K>f in the bathnH)m. 

Another t-xamrle pn^*^Tve*^ the amusing and fvrhaps half understood 
ideas of a small hoy, Jl wa'^ ik^i published in 18^2. 

R came into the house eating a ho rn^ chest nut. 

Gratiflna: Well R. if you vM that horse-chestnut vou 11 die and go to 
heavi^n with your mi>ther. f^. ^ then 1 shani h.we any R. ^lis mother 
was dead. J 

R: Well i n go out and get to ^^liestnuts. one tor grandpa, one 

tor you. one tor Aunt H ai: . - >r me. Ml eat mine tirst then Ml 
die and go heaven first. Thei^ ,;randpa li eat his and hell die then 
you'll eat yours and well all be up there together. Wimt that be nkv, 
grandma? 

\LiU'd hv Hi-iK'H f VVfi>:lH, O^m tv.it fimal ( hiki SttuK in Wni] U. Miivm^n Uid ' 

2. Rumtiu^ record 

A running n*cord provitU»s a ilesc ription i^f oiw childs Khaviour over a 
period. It is one way ot building up a carefid descriptitm ot what the thild 
does, but depends on the rt^cord containing a gimd desiription ot the 
environment as welk In other words, it attempts to provide an account 

/ 

ERJC 6o 



ot vvhiit the child dot^ irom moment lo numnml, in a particular silting. 
Usually, wc at tempi tu reci^rd as nuich thv behaviour as possible, 
That j.ditlkuii and wf find that about 15 to 20 minuU^ is as much as 
\wcan iio at om* piTiinl. Hinvrwr, a fvally usvtitl R*tori} can be built up 
by d4>in^ a numK^r of obst^rvations, tvith vithvr a tmv minutes or hours, 
or ihiys in bet ween. 

The major advantap,e this technique has is that, by trying to note dc wn 
on paper (i»r intc>a tajv-recorder) i'vcryllnn^ thai happens, we Gtn si*e the 
iinnplt x netivoriw oi inleractitms a chiKJ has with others, and with the 
envircniment. I'he major disadvantage is that so much is Hkely to hapin^n 
that we bea>tne seltHlivt\ or Ivgin tu lack precise liescriptions i>i what gi>eH 
on. In such lan^s, runnin^i; riHi>rcis can hv misleacJing. and more prtH:ise 
techniques must Ix* bnHi);ht into um\ 

Here a small example 4^} one tour >i*ar old girl's K*haviiuir. 4tver a 
tt>ur ininute p Tind. It tries to convey the |u;enera] picture, hut s4>me ot the 
detail ot the pii fur<* 'washes out, since there is so much lo try imd a'cord. 

A xil paste table with mi^lher mother leaves table and A ^»i/i*s annmd 
dre. mily at the <>ther ihddren. A now leaves the paste table without 
dc^inj; any ^vmk. she ^ovs o^'er to dou^h table, but there anc* no empty 
siMts and she doe*^n t know when' to ^o. She is rather uneasy Uioking 
around her. Mi>ther then takes her In the haiui Int^* the dolls' corner 
whi te she ^t.mds K'siile the teasel table just watching the other childn^n 
play While sliil Mtindin^', at table she pulls out the chairs and pours 
Ihv femptv ^ teapot mfo the tups leaves and wand4>rs annmd the dolls 
beik, pK vs up Home iioils clothes troni the llot>r then just tfirows them 
down ri^Min, ^ives the rockin>; KhI a push as she passi^s then laki*s d<»li 
from pushi hair puts if caretully to stwp in the pram, tucking; the ru^ 
gently around the di>ll leaves pram and si'tties antither doll into a 
Ivij. tiiiyin^: blankets and toldinK'^htvts. then imce a^iin tiilies the dolls 
bi'il very neatly. (Solitary playj A leaves ci»rner, walks slowly ttwards 
di>UKh table all the time sucking her tin^'.er (still no mom) still suckin;* 
Jin>^^;er she mines slowly twer io pasti» table taki»s si^me blac k pajKT 
o%^er to table and starts to paste. 

I?y o{n»nmg up a wide ranj;e t>t ;>o.-si7>/e behaviour U>r exents) U» observe, 
running rtH^^ords can be most v-iluable. 

l\imela Kennedy qui'stioned whether entry lo schoid at live is a 
transition or trauma ihtuly Chiitilh'Oif in Zcalaud Second Early 
ChildhtunJ Care and Development Onivcntion. \^7^). Ihrou^h running 
mxord observatii>ns ot children around five years i^ld in pre- school and 
junii?r iLss siltin^^s she was able to show how i hildren in the two settings 



ERIC 



hiHi iiiihrvnl uppurluniliiHi hir motivaiion. concR*U» expericnivs, soaal 
intiTuction and fivHm^ ih'm^^ in bakunr. ThtM* arr .i>pi*t ts Mr4>\vth ivhkh 
l^iaj^^t aimidem^d impi^rtan! tor chililn^n il ihy .in* tii ilrwk^p ihvir a>nt n>tf 
opcmtionjl thought and rvaM^niriK. Thus, running; rtti^rjs si'ViTal 
childnpn provitk^d ior Vamvlt Kvnnvdy gt'nt^r.iliml tl^ta on chilti 
dewlopmcnl at *i *^pca!k a^c. 

(ot individuals or jfiv use ol an ai tivity) 

This is a dvvdopnn^pt oi ibv rummy, mnnrd. A Wvpvr or some timing; 
ikvkv ^ives Ihv i>b^;ivv^ a hi'//p and unly what is haf^ptninM at that 
moment is vvriften down. Then, alter a ^ap, anoiher bvvp and what is 
happt^nin^ is noted a^^ain, J hus the behavii>ur is sampled' at prt arranged 
time*;, usually at 1, ,^ or lO ininufe intervals. 

In this %vay. time- mplin^ is rather like takm>» a sliu- time out 
a running n^xmi. C it i^ like takinj; a single frame out <»f a movie film 
at rvK^lar interv,iU, m> that f\u h frame can be Icniked at more can4ulK\ 
When sufticieni samples ol time have been taken, we can start \o tlravv 
the threads logt ther. 

With time s»inip]'* pbser vat ions oi an indivitiuai ehihl wi' ma v. \or ex- 
ample, discover t!:al wheroas we ihtuj^dit Mark pUyi\i with i^lher chiltiren 
quite ri>gularlv. it jus! m> hapftmetl thai wt> noliunj him only when he was 
playing with i^tfiers. Now- that we have kn^ked more s\ sti»matieal!v at his 
play, we realist* that he w.js playing: with *>!h<T children fi^r only h\o ot 
the is^vhv s^unpies wv Un>k over an h<nu: tfia! is luirdly rejuilar social 
contact. 

With time Sfimpk' ol>servations of an activ area, we m iy, tor oxampk\ 
disctnvr that bknks arr sekk^m usrci until aiter morning tea. We realise 
that it is not until the thiidi^fn have sit nearbv that thev rememhiM- the 
block cx>rner tucked behind shelviny.. By use the s*ime technique, we 
may discowj that as the numl^T of btiys usin^t the bhuks rises Uhrou>;h 
any hour ot a\^;ular obsiTvations^ the nun^HT v\ ^urls usin^ them declint^. 

A Hmi> sample ol all activity areas during irev play can provide 
information on how children are disfu^rsisJ. lunv pi>pular ditteri^nt activitit^s 
ATv, and the patterns i>t pluy at the l>t*j;innin^ or end ot the sessions. As 
the number ot cast^^ or att!^ ities you want to observe increases, m> the 
information to bi' g^ithered imds tn hi- miide iiw kept) simple. 

One value this ti-chniqur has is simplicity, A wrist watch with a svvt*ep- 
MTcxmd hand t>r runnin^i seconds is all that is actually nmled tor timing. 
By 'sampling f inu^s during, thv presdmol session, or during; the day at htime, 
or even <»ver periods of days an accurate reciud c<m be built up. 

ERLC 



4. Thfic MmpUfi;^ cnh^vfU's 

rhW is stmiLir to nuinbtT ^. vwvpi io\ two c unvuirrubir rvtinrmiMils. 
KtatudinK is fvdutvi! \v writin?; u >impic unio or jnitcinj; ,i ihtuk nuirk 
.ilinigtJdi' ii list «H th<' san\plin>; ivuHiu-ni. \\ xiyA^Wi^ lUst Uu- {MVm'Iuv (or 
abs4^ntvJ o! ,i spnilii bchavu-ur. With suth a sinipU^ task u>r t>bsiTver 
it hiwmr^ possibU- io obsiTW >*,nnips tbiUhrn as we ll »is individuals, 
I his j>iociH.Jurc is vvidvly ust^d in iM'haviiujr analysis studirs \vht^av ior 
rxampliv it <an provldt* sysUnnatii rctords ol \hv Uvauvncy probUnn 
behaviour. This ivin be done litjrin^, ba^t'hnr <ijrvt*v^ and ilurin>', iiralmrnt. 

it is at Hist ^;l«nui\ tjuilt- a sirnplo pnutdun*, lor U .ivoids mmhi^ of ihr 
probk'ins ymi >;i't whrn Umi^; drst riptions bch.;\ iour havi. lo hv ivriUcn 
down. But this sin^phcily is dm'ptivt*. The mtci^^ny ustil ior ihr bt4iavioiir 
you want lo obsn vt> ifin-A .'cty t ufvf uUy Aiulc^^ory label such 

as crcalivr play' ivr nnwns viMually iiolhin^. until you haw gonv 

iJirou^ih the diHii ult task dcHninji'. \\, and givin>^, examples oi the kinds 
o\ behaviour you could or eould ni*>I unlv in (his way 

An e\anipk* of a sample (<vordin>; thai!, in whiih appiopnale (Ai or 
inappropriate U> behavimir by ihm* chiKhvn was niouie J.is shown belinv, 
Nole fhat ii is very easv In see wha! ih.e avera^U' amour^t t^l 'appri^priate' 
behaviina is. Yoi Harry a!>d Tow it ;s 30' oi nne>, but lor Diek. it 
is70'V. VVlial appii^pri.ite behaviour* is niu^t. hi- very laretully 

spelled ouL 



1 inir 








Phk 


o 


A 




1 


.•\ 


10 03 


A 




A 


I 


10, Is 


1 




I 


A 


1 1 .00 


A 




.\ 


A 


! i.:o 


i 




1 


A 


i I. L- 


I 




A 


1 


1 .03 


1 




I 


A 


] ..^0 


I 




A 


f 
i 


2. no 


A 




\ 


,\ 


:.so 


A 




I 




i i ,K nil iiUir 


'liiii Ill li.j'. u-vA , 




in \i v. 










: .1,, ./.'...I.';. ■, 


'i M < ; . 



if':*/ r'jT' V\t l]}fU'.h»n N,/l ]., ^^r;-! |i ^."i 

Where a trained 4'bserver tan be presi-nt in a t lass or a*nije, ii is jx^ssible 
to have nuu h mow caretul atientitm paid to the numbers (xind {vrci-nta^es) 



10 



7i 



oi childri'n who urr enj«.iKt>d in ihv varu^us cKisNixnim ,K tivitieH. In this 
wa>^ km 4i:KurMc pit Unv oi ihv fviiu^ni oi inU'n^st and uUrntiofi ihrouy^hmi} 
ihi' day um Ix* huiit up, .ind steps takvu io rvmvdy any shortannings in 
lht> pro^ranirmv {h^r lurfhrr rcadinit; on tins, the pajHT by CAynn 
teUrn%l to ahovv or Carol A. C ar!wri^;h! and C. IliiHip CarivvriKht. 
Dvvchptnn Oh^i'fvi^tiou t^kilh New \brk: M(( ;n«v Hill 1^74: or lodd 
K, Risley and Mkhael C'aialdcv Phiunai Ai tivity Clnrk Kansas City: 
Center tor Applied Bi^haviuur Analysis. ]^>73.) 

L\ hticrval frroniins; 

Tliis is a kirther rdinrment of time simphng, with ihi- feather iir rcH^anher 
hKusinf^ un (it i>ne perst>n only (ii) a number ot <.ale>^ories oi behaviour. 
In this case, a simple yes or no is recorded to the question, did the 
lH?havit>ur ouur? ¥ot example W is eomm4>n to observe ior JO sea^nds. 
looking jor any In^haviour jittm^ one 4>r mori' oi the iate^^ories. Then theiv 
is a 5-second inlt'rva! ioi making thechet k marks, tt>llomHj immediately 
by the next 10 second olisiTvation fn-riod. And si> on tor ! S minuti^i or sir 
An examj^le: MaiNha Uvinraub and lay Frankei rSe\ ditterrnivs in {urent- 
intant interaction ilurin^t free play departure, and si'paration' Cluhi 
Di^vclopniiiit l^^/7 v.4«. pp. I240 124^* wen- a!>le to show that: 

Parents talkeiJ lo. ^;ot down on the iKun to play with, and tended to 
share play mow wilh sami-sexeii than i>ppoMti-st-\ed infants <a)L;ed 15 
months lit 21 months J . . . When infants were cIom* to their mothers, 
mothers wen* minv Hkriy to look, vocahse. iomh, sit on the floor anil 
share plav wilh I heir ehikirx-n. Tliis was not true ot fathers. 
How did tht'y know this? 

PufVfit ifijivii /'<'/iiKMOH?s u'Cfi' t^h^cf\w{ ^n»?^? hcluud a o?Tr-;rin/ )mrror 
, . . U^ius^ {hci'klis!:^. fhr ori'urrvinr or not! Oi'iurmur of pivl'uuUv 
hclun*tour- mtlun 5 -^ccond nui^trnL^ ini^ o/rsrr.V(/; a J ^tHOfui interval 
ims uH'iito rviord flic iintu Afi hihUo rcii^^dvi^ii liclii'cfcdto the oli-^crvet^' 
hemipJuvwh iUCii throh<^tvri'i u^hcn to oh-^-rrr tvai u^Jicn to nroni Pim^Nt 
hchaviiUir^ ifu huicd looking, fhc iniuui. ^aului^ nin^^nziiie^^. iniil ^itfniji 
OH flw f/oor 

A New Zealand example is Anne Meaties lotou Tciu hm^ in Xcu^ ZiniUvui 
Early ChMhooii Centre^ iVVelhnj^ton, MZCKR, forthcoming). The 
observers were able to show that there was li^ss adult child talk than most 
early educators cxpi'cU-^j, jandj adults y,ettinK down to thild level was 
asscKiated vvitii miirt' sustained convers,itiotis .... Trained statt did more 
to foster (hildn^ns learning through talkm^ to them and through play 
involvement than parent helpers, {p.4). 




11 



In this caR% the observers t^hjilimTd on-duty adults *ind when a beeper 
gave the cue k'very SO scaooih) the ol>serveni chi»ikewl which ot 17 types 
of behaviour occurred in the 3 sieamds JolUnving. The calegi^ries tovwd 
adult-<hi!d talk, atteetiw (emotu^nal) K^haviour, play involwment and 
adult-aduit interactions. 

It can be done, but it is ditticialt, to um? this technique without some 
specialised equipment to produtv the Siiund which signals when to switch 
fnim observing to rerardinf^. Electronic timers arr inexpt»nsive but nmi 
modification to pet them to beep repeatedly. However this is a technique 
you should think ot caa-fully: it combines (i) the StimpHn^; of behaviour 
needed for systematic recordin]^ with {ii) the namiw focus on a tew specified 
types of behaviour required ior detailed analyses. It c\>uld be adapted to 
use more simply: a timekeeper whispering the beeps' in the obsiTver s ear 
could do the job. The schedule tor mterval recording could look like this 
example, (a tick ( ^ ) indicates that the bt-haviinir iva$ ^ccu in that intervaH: 



1 ime 
intt*iv*i! 



C'»iti'j»;i>ry Ikh.u ivnir 
12 3 4 



J 



3 
4 
5 
ell 



6. Evcfit rccordifJi^i 

The Ukus of attention is an event, ior example, the occum^nce of a certain 
st^rt o! behaviour. Time intervals between events are mo/ impt^rtant. This 
approach can be as simple as recording the numL^r of times a child uses 
a certain Wirni, or plays co-operatively with others. Events can be a'Ci>rded 
using pencil and paper a wrist golf-counter', a knitting counter, by 
elect rotiic event recorders, or even by transferring jx'bbk^ from one pixket 
to another. 

Here is an example of a complex procedure: t>ne study stiowed how 
much preschool children reintorce each other s social interacliuns. It is no 
surprise, of course,to find that siHneone being nice or nasty to you lends 
to make you nice or nasty back to them: the important point is that the 
number, kind and frequency of specific events wet* Tvvealed. (See Michael 
P. Leiter, A study of reciprocity in prc»school play groups, Child 



Din}vhpmait, 1*377, pp.1288 12^5.) 

In his study, Mic hat^l Lvitcr i?hserv'ed {or 15-minutc jHTitdis, for a total 
oi 40 hours; with sevi-r,;] cariffully-dt^inetl tatt^'orit^: 
. . . ci//5orfW ijiitiatiom^ nmdc by a tutsict i hiU are reav Jiui. ahn^ with 
the immciiiutp <ocui} rV!-pome to the iuitiatiom . _ . The rotlifij^ fvconi 
ims fmdc on blank i^hects of paper on u cUpboimi with felt pen .... 
Child chilli inteuh tiiVt^ leere ra onied for lo tiubjects during^ fw play titfw 
at their pre^chonk . . tich ehild wu*^ ob^en^^d for 20 I5-mimite 
intereah yiehlinsi >ho. . ecord of behnviour for ntch child. It was found 
feasible to obn^rve two ehihhen ^inndtaneou<lu. 

Event n*cordinKs do not h*ivf U> hi? ♦ifi complex as this. A *i\mp\v technique 
would be to wjtch children's play at a particular activity, and note, say, 
the frequt'ncy of conversations or physical contact. A New Zealand 
example is Kli/abeth C^onnelly s study ot the sandpit and block areas in 
a numlier of Auckland kindergartens. She not&i rejection events and found 
that only a small proportion o\ ^irls were actually rejecitd trom tht^^e an-as 
during the Jourtcw sessions iU^serv^d. This data led her to Imik for other 
explanations ft^r the girls' Unv participation rate, (Massey Cerliticate in 
Early Childhoml Education Special Topic, 1<374,) 

7. Duration ra ordinjii 

The aim ol this procedure is to eslablish how }on>; a particular bi>haviour 
lasts. A stop-watch is all that is required it a single child is to l>e i^bserved, 
but moa^ uselul intiirmation abou! matters such as time spent in various 
activities is gained by using a prepared sheet, and fi{tin>; it in. as in this 
example. 

Activity Block^^ 

Child 1 me^.W ^.05 *3.10 o^is u,2i) 0.30 Tola! 



SUry MH^MB 5 

Kathy Mi 7 

Mark mmmmmmmmm^^^^mm ^^^^^^^ 



A more complex varidtiim of this schedule was used in ior a study 
of girls participation in pKiy with blocks and car casi>s in six kindergartens 



13 



{an NZCtR Siuicty ior Kt>t»arch on VVomvn in New ZiMlanti project). 
Tcathen.' namvs vvm» added a^ \vt>IK Thus, it was possible to chtrk whether 
more girls p!ayt*ii tor U^ngrr durations when the* teacher(s) wrrr prm^nt 
(and, similarly; w^hcthtT the rates t hanj^td when Kws ioimxi in), 1 his study 
will he Reported in ihv sn'ond issue x^i M*f tor 1^85, 

A ditterent approach to this is to study one child, with one <;pi^"iHc 
category ot behaviour in mind perhaps after siuh a eomment as lee is 
aluHjy^ tightin^ (^her children*. The example below was prompted by 
sonuH>ne saying, Mary is a{uHW> wandering aimlessly around'. 

Anne vSniith prt^vides a sample srliedule ti^r Mary. Puraiion rinrording 
showed ilearly thai Mary wandered aimU»ssly on lour tH.xasi<msand only 
one-third ot her time was spent this way when calculated thus; 

time wandering vvhi!*>t observed 

I Vreent wandering - - ~ ^ 

total time observed 

Theobservatiim schedule {the form filled in by the observer^ included a 
precise detinition ot all behaviour which ccnild be classed as 'wandering 
aimiesslv'. On the schedule ihe olM^^er noted theex times throughout 
the day that Mary starleil and stopped wandering. tSee Anne U. Smith 
Ujtiico'^itmiims, ChiliJfvn ^ Dwrlopmc^it Sydney: C;eorge Allen Unwin, 
1982, P.45J. 

8. Tniit ratiu}; 

This approac h is much pretisi* than many of the others, but slili has 
a place, provided ot course that it dots depend on caret ul t>bstrvation 
rather than hunch . A child is obst*r\'ed, probably tor a specified time, 
and then ^i ^en a rating on a given trait, h^r example, is rated 3 on a S-pi^int 
scale tor friendliness'. 

Trait rating was ust^d, li>r example, in a study by Walter Hmmet ich 
( Evaluating alternative models oi developments Child Dd'chpuwfjt 1^77, 
V.4S, PP.140M410). 

It should be noted that each mtin*^ was Inmul on apfnoxifuatrlx/ 30 minutes 
ViVitinuotis oh'icrvation of tt fatjicf i Jidd within a free phy o: ^mt^lJ j^roup 
cofUext, Fofhneiu^si c-uh sucJi oh^^rvntiiV!, the oh'^rrecr twnediately 
completed a ratw^ >ihedu}e. Vm inchided 14^ UnijH^larSctde^. expUcitly 
defined by ti fftafiimi (p. 1405^ 

Nevertheless, it is clear that trail rating is still likely to be unreliable ^see 
David Y. Schulter and j. Kogis McNamara, Expectancy Factors in 
behavioral observation'. Heliaviour Therapy. t^7b. v. 7. pp. 51^-527), It 



14 



is a technique io hv used very caa-tully tinU prHtTablv only to suf jpleim ni 
othtT approaches. We have very clear research eviiience (hat even U^chers. 
busy observing pupils ever>' day, can, tor a host oi re.m>n«i, r.He chiidivn s 
trails very inaccur.itely. Ask e.jrly etUiraiors who U-am-fe.K h in the s.inH' 
cxmlre to rale children .ind yini will c,ni>o niuch dcbjfv bccinse each will 
perceive the children's tharacteristics dilterently. 

Some Further Considerations 

i. The definition of lUiivgorici^ 

The mi>i.{ hmdamental htep in c^iwrvation is spetiiying as precisely .is 
ptwsible the behaviour to be observed. Vaj^uc or generaliwd descriptii>ns 
lead only to trustnition: nt^body eisi' can be really clear about whal is 
beins okscrved. compai istms or jiroKress checks are nt>t reliable, and the 
observers iheniH-Ives are mH sure whether the behavitnir fits in one 
category or another, '{'laying, as a description, lor example, is most 
inadequate; playin;^ with wotjden bloc ks is a little more {>reust>: stacking 
wotnlen block on lop ot twti similar ones' is even N'tler. The prtH.ision 
a-quireil may var>'. but in uny obsiTv.itiv»n there are two fundamental 
rule*: 

(a) you must start out to observe and ret «ird only behavi»>ur whit h i-uti 
be clearly seen and or heard: 

(b) you must dillerenfiate IhMw , !i similar types ot behavunir it they are 
likely to be contnst»d. Consider, tor example, physical contact In-tween 
young children which may bv quite dittcrcnt defH.-ndinK ''n whether pushing 
or hitting is involved. 

2. The number of {-ate^ories 

The number of categt^ries you can handle reliably varies according to the 
clarity ol delinition, the technique for recording and your experience. 
Observations are more likely tt. W- accurate and reliable it there are tew 
categories. 

How many is too many? Studies ot behaviour amlysi«i use about four 
to eight categories; lor example, a clahsrm^m study u.sed one on-task" and 
six off-task' types of behaviour. {|.D, Thomas, F. Pohl, I. Prciland, and 
E.L. Glynn, A K'haviour analysis approach to guidance , New Zealand 
Journal of Educational Studies. 1977, v. 12, pp. 1 7.28.) A number of child 
development studies report observations with many more. For example 

15 

ERIC 



the vcr>' important •.tudy ot mulhrr rhild interaction by Allison Llarkv- 
Strw^rt, pbsen-nl 26 m^itrmal, and 23 infant types oi K^haviiuar, usinj; 
an event rrcor 'ng approach. (K. AlliMin ClarU-Stewar:. Interactions 
betwwn mothei\^ and their youn^^ children: thar^i^teristic s and 
consi^quences ^^ivu\^^ru^}h^ of tJw Noi .Wy fnr Rrxwch tn Onld 
Devchpnwnt, 3^73. v. 38. nus,^-7 ) With w many tategoritH^ to be 
obMTved, thene needs to be a };reat deal oi training anil practice tor 
observers; to ensure s*Uistacltuy level!* ot a>;n^'rnent bctwtH»n liilferent 
4>bser\'er?,. 

The most important consideration when undertakinK ol>servation studies 
is making suw that what is being observer! is important, and h'kely to be 
oi value to the teacher, parent or whix-ver is undertaking the study. In 
Si>me cast^. th*n, the question oi acairacy din^s not arise: the diary 
dt^ription given earlier is one paR*nl s rvumi ot s^mn- asfn^its oi her child's 
development, and its value lii^ in being a jHTsonal record. It makes no 
attempt to h» a thoroughly complete documentation ol the whole oi B's 
development. But whiTe i^bser\%«itmal records aiv to bv used ior specitic 
purposes, ever>^ attempt must be made lo reach a level oi accuracy and 
reliability. 

An unite o/tSi'n^i?/unts require clear, unambiguous deiinitions lor it must 
be pi>ssible tor the obst^rver to riTord exactly what the behaviour is. If 
aggm-ssion^ is to hv rm>rded, for example, it may be that accidental 
contact, or jostling, needs to be separately considered. Obst rxTrs must 
get it right' each time. 

Reliable oh^erxHitiiyus require that whoever is obscr\^ing records the 
behaviour consistently across several sessions. You can be reliable, but 
wrong, 'if course. Hcmvver, that is easiiT to fix up than being unreliable. 

The i ^sue of accuracy and reliablity is very large, and cannot be 
considered in full hen\ Teachers or parents usually want to use observation 
as a basis for changing something. It is sufiicient for them ii) to practise 
until they are confident they know the techniques, (ii) to check their 
definitions of each type of behaviour with others who are interested, (iii) 
to check the extent to which they and another competent observer agree 
about how each typn? of behaviour is to be recorded before beginning to 
record, and Vtv) to have oth^^r o casional chi*cks during the course of any 
large series of observations. Inter-observer agreement, in its simplest form, 
means that two obn^rvers indef^ndently make a record of the ^haviour. 
and tht»n check the extent to which they agree. Where categories of 

10 



behaviour have Kt-n ihi-ckcd, the task is easier, .is fhc agreements tan 
be counted, A common prtKedure is to npply thi5 tormub; 

number ni tignTments 100 

numbiT ol agrtTment;. 4 numhir liisjgrwmi'nts i 

This givi's a percentage agreement. Usuullv SS' . or so is con-idered 
an acreptahle level oi agreement, \Yhert« ot^Tvatjons aw pari ot a re^'arch 
programmi' nuuh nwrv needs to hf consjiierid. but Jiir classroom or pre- 
school prattitfs, ihc above shoiihi bv sutticienJ, 

4. The lifi'i't of ioi oli^avci 's fve>ictuc 

It is generally agni\i that being watc hed may make lor unusual iH-haviour. 
Thi.^ is more likely to the tas<' when, tor example, in a tamily home 
the t>b'erver is unknt>wn to either the child or the pan'nl. It is much less 
likely it the ohs*»rver is in a class or centre that adults otten visit . However 
there is m way ot telling how much ditten«nce obn'rvers make in ditterent 
settings, and so only Kfntral guidelines can bt- gis-en on the topic. 

First, it is imptirtanf that the obst-rvfr remains as unobtrusive as 
possible. This may well conflict with the r.ecessity of being close enough 
to see and hear the behaviour which .s being observed, but 
'unobtrusiveness' is an attitude as much as anything. It is clear that an 
observer cann(»t both interact with the child (or person being obscrveii) 
and then change hatk to the impartial role. Obser\'ers shi>uld also avoid 
the mistake ot paying obvious attention to any particular l>ehaviour: it 
is likely that a sudden show of activity on the part ot iht- observer when 
aggressitm (<ir ivhati-ver behavitnir is being observed) tHcnrs, will increase 
the chances ot that behaviour iKcurring again. 

Second, it is preterahle tor observations to be spaced over a reasonable 
period of lime. It is probable that any t-tfi-cf arising from an observers 
pr *nce will be gri'atly decreasetl over time, as the subjects become 
accustomed to his or her presence. 

Third, it is alwi probable that the effects are much redua'd when the 
subjects are younger: babies and infants are unlikely to change their 
behaviour solely in response to the pre.'itnce of an observer. However, 
they will respt^nd to many irrelevant' changes. It a young child's t>ehavjour 
is being observed in the presence of his parents, for example, if the parent 
changes his or her patterns of interaction, that will almt^st certainlv change 
the piclunp tor the child's behaviour as well. 

Finally the whole context in which the ol>servations are being conducted 



17 



r!c 



iwvth Uy hv umsidcrcd. t^hildrrn in primarv st h^u^ls m \hv cities iirx* 
niMiMitnisly hunili.ir with ibv visits ui studrni liMciuT*^. ^Jiui tvml \o ignun* 
thm, l hai may well bt* an a*.sol ior tlu^ obNt*rvaii«.>ns the students Inivc* 
H* dii. On ihv iyihvr hand, childirn in pLiyc rnlrrs and playp.rtnips .iii' tiM'd 
tit havinj; srvvial parents arinind dunn>; ssjofis, atui knuw fhat they can 
tall on ihrir help, %n siniplv onvrrse willi ihvm Siu h j lamiliaritv makes 
it iwrv diHicull ior a pl^v;;'^Hv sijpeJVjstM to he unohlrusive whilst 
observing . 

Another importiUil t onsujeratiini is um^ uhuh thi* u}>svrvaliiMial 
material is to be put . In the playtenirr i'nnle.\l aj^viri. i!us is easily aiu^pted 
as Mrs Srnilh is di^in>; her obset \ ations a>;ain . ljti\v lurfhertAj^lanatii^n 
is lequired, either by ihe t hildren tn by the tether juienls. Mtnvever, where 
a dilierent kind oi ol?svrvaiiiinal appriMch is ust^ii or where any 
observatiiMis arr a novelty ias in resi'.udi obMTvations in a l.uniK ^ it 
is \Try important that ihv reason Im \hc i^hsiTVation is fxpl.uni'il, and 
reassurance ^iven that no parliitiiar iiuinKC in heh^iviour is reqvnrevL 

All obsi^ryal ionui reporis ru*i'd an .u^oiml ej \hv saiuuis lailois which 
niav aflevt Hie ihild or \h\' setin^);. IfnsN the ilali^ and time oj tin* 
obsi^rvatiim and a briet dcscriptiiMi o\ the sfttir;*: umMn an.in^iMrient i^r 
play aiea^ ^^hotild attached, ! hi weather may aki^ lu* an important 
tailor esfH^ciaUv it spt^iitii kmd^ oi play aie beinK i^hs^Teed ^wer a week 
or si>. 

WhiTe the c^KiTeations are bein>', used to chr<k the elh-t l** ol changt*s 
in teaihin^ t'^r setting, theM* eioto^:!ca! « cni^-iiierations are terribly 
impcn'tant, tor o(!nT\yisi- there is the dan>;er that chan>;rs may W v\ron^!y 
attributed to what the obseryer wants to see. l ecn the number ot adults 
{>resent at a certain lime ma> rath'udly aher patterns ol pjay or other 
activity and sn these factors shield also be retorJed. 

A numK'r ot vvriters insist tiiat i^bservatuuis car only !h' undertaken 
in the natural environment and that no speutu stimulus should be 
introduced. Hut as vcv have notetl the presence o\ an obsiM ver has already 
thanjk^ed the envir<'»nment. So d^n^s the lime oi day. in some casts. (For 
example. t>hservations oi very ytnan^: ehildri'n are hkely hy Ix' greatly 
different if they take place in the mornin>^. aruJ the late alternoon J Many 
obscM vationa! reports an* intiTested m wh.i! happens w hen some specific 
play material is introduced, or when teievisi<m pr^^grammes are shown, 
or when there is one ol any number ot ii»ntrived stimulus situatitms. 
IVovided thest^ stimuli aie reftrred to m reporting; the itKervatiun, there 



l» 



IS no reason why they should not be until. 
6. A note about ethical concerns 

ObMrrvai^ms should only be undertaken whi^n two conditions are met; 

^ ^:^ti. '^sion of the person observed {or pasvnl if nixefisary) has been 
obtained. 

2. The material rpcoriied ,i a^ull of the obserx'alion is kept confidi^niial. 

In bi^th cases, this means that the child, parent or teacher, needs to 
have the reasim tor the observation spelled out in a way that they can 
readily understand: and proper safeguards tor confidentiality should be 
explained before the observation is done. 

If you tail to get permission and do not keep the rt^ults confidential 
it is not surprising if the fHmpIc you want to observe refuse to let you. 
or others, ever observe again. 

Video Recpniifjg far Obsewiu^ 

If you have a well made video recording many of the trchniques described 
above can be applied to analyse the behaviour you have captured. And 
you can use first one technique and then another. This takes away some 
of the criticisms about how selective obsi'rvers' views can be. 

The use of a vitleo recorder, as a teU^nology to aid observations, is 
a major topic in itself and a full item in n^t is planned for a future issue. 



The Authors 

Dr Bnice McMillan is a Senior Lecturer in Education at Ota^o University, 
Box 56, Dunedin. New Zealand. 

Dr Anne Meade is a Research Officer with the New Zealand Council for 
Educational Research, Box 3237, Wellington, New Zealand. 



ERLC 



19 



•3 



An iSZCER Infomialion Srrtici* 




SI 



Rent 4 



get 



nm\tmwfe 1987 



EXTREME TO THE OTHER: 

A report on Profile 

Reports 




ERIC 



One Extieme to the Other: 
A Report on Profile Reports 



By Graeme Withers 
ACER 



James fc'f 


t Profile on 5 Traits 








10 








Score 


9 












8 












7 












6 
























4 












3 

/ 












1 

0 














X, Xj X, 










Traits 







Botti H^e profile fotow soiroone's defctittwi <rf vrtiat a *pn3«te' 

»s. TTw most common 'profi»e\ fwmver, d^olbes 

the PERFORMANCE of ONE student in SEVERAL Skd>^, 

dir^CME^ar. 

The i»rKmmMis<:bften^K)es between i^tcm^ teadier mi^eins t^ 
eiiwof^ aiKi what anotter trmsm ieacte to (»]»ifi^>n anu>r^ 
te^x^tum, paimts, en^^ki:^, and offid^. This eet item wiil 
d^i^y some of »m d^mn types of as$es^nr^ aU flyif^ the 
tamBT PHOFilE, and makB recominef^iatk>f%s Bt)out them, mi 
Utelruj^, 



Sue k's Profile in Drama 



Sue has shown an unsatisfactory level ot 
achieveront in Year 12 I^asru, 

She did not csamplotc all ot ti\tf mi|uirt<J research 
and viork that v^as ca!pleti2d was superficial arid 
limited in scope. She could not articulate ideas 
clearly or arjalyse and ab;jt?ctify tne drama 
expt^cience. 

She tailed to complete any of tlie mask desi\7ns in 
connection witri the perforrnance and her lack of 
ccrmtment to this project severely hanpered tha 
other students with when fhe was workinq. She 
consistently isolated herself from the processes 
involved with the perforniance arri showed little 
flexibility in v^rking with others. 

Sh*^> atterKJed the rfcijuired nunrioer of theatrical 
perfomances but her understand inq ot the elements 
of the art was exti"i.^iy limited. 

Her expressive r»kills showt?d saiie mprov.^inent but 
her exploration of ideas xn vsorkshop;; iacxc^d 
iniagi nation and tipontarunty • 

She wan unat^le tx:> work iiuj^x^rtively with oti^er 
studenti^ or tal;e ref^pr.nsiblity for initiating 
activities. 



What the Profilers Practice 

Just as different schools and colleges produce extremely differen! 
pieces of papi?f at the end of their profiling so do the 
philosophies of the profilers differ Two sets ot ideas are at work 

1 What ATTRIBUTES should a prof tie have? 

Some will say a taie profile is no mom than a set of co- 
ordinates t'ghtly described. 

Others wiil say that a true profile is a descnptive statement m 
words, and unquantif^le. 

Yet others will say a true profile lies somewhere between 
these extremes, 

2. What FUNCTIONS Should a profile have? 

Some people mil stress competitive practices 

• administration {school decision making, record keeping) 

• flection (employers, further education) 

• gukiance (diagnostics: counselling) 

• hiformaUon (to students, parents) 

• motivation 

• dlsciplhie 

Ottrera wll stress non-cmtfxtitive practices 

• enatHing (better curriculum and counse planning) 

ERIC 



• developmenlai {a ^earners Knowledge and sense of 
autonomy) 

• co-operative (interpersonal learning and relt^tionships) 

• assistance {to the learner about his her learning) 
» placement (courses, pbs, further education) 

in the descnptions of seven basic Types of profile whtch follow 
you will see these differing philosophies about the aiinbutes and 
functions uf profifes coming througt) We w)ll leave judgements 
about them till later 



Seven Profile Types 

1. Feed-back assessment and reportiiif] about 
courses 

in some educational svslems, a set of tnal exaniinatK^ns is con- 
ducted at the end of the fust or s€Kond terms to give a percentage 
score for each subjer:t These scores are not counted towards 
eventual success or failure but are expected to help motivation, 
guidance and discipline (see the competitive functions listed 
abovfe ) A! a basic level, the resui*s provide feedback for con- 
tinuous improvement and revision. 

S3 



F^Mi^ 1 0Dmas ctose tt> being a i^XK^ 
If* ftrnn report certflcM© of a ftew Zeatetf>d school. This 
frame ^ aM»>mpani©d by similar frames for each of tfw other core 
sublets, ami secfitms (aUt^irif^ comment only) chi bb^ '^c^ve* 
i^vHy. The f&K>\e ck>cumerrt is a proftfe f»x>c0dure; ^ too is Sue 
K's Profile m Ckama. o(\ page 1, a profile of English skills. The 
lvalue of the report from a formative feed-back point of view will 
(topend on tfie nature and content of the comments : lx?w student- 
oriented, positive and specific they are. 

An importfiait distinction between Tyf^ 1 and other procedures 
is made by the schcx>! in introdiK:»r^ the rep(^: 

This report hs interKled to advise parents and to assist pupils in 
their academic, personal arxJ social growth. In addition to the 
subj^s Hs}^ on the inside of this rs^x^rt. all pupils take seven 
short courses. 

This report is not designed to assist employers in selection 
procedures. Pupils s^king enjoyment should apply for a 
College Recoid and Reference Certificate which summarises 
their major skills, attitudes and personal qualities at the time of 
writing. This should be adequate for the needs of emptoyens. 



FORM 3 



First 
Second 



HALF YEAR 



c 
o 

M 

M 

E 
N 
T 

5 



TEACHER: 
ATTITUDE i 



Writing 
Rgadinq 



Speaking 
Listening 



Presentat 'ton 



Grades 


A 


f';\ceflen! 


B 


Very CUnni 


C 


Salit>factor> 


D 


Causini; Concern 


B 





Rgure 1 : A simple subj^ profile 



A second example for Type 1 is Ftgure 2. from an Australian 
high school. The assessment ba^^js is the specific v'ork com- 
pleted (note the rcpOft on each of six . numbered, contracts), 
and it contam*? a sectton outHn^ng possible contftbuttons by 
parents. If the cofUracts are a profile of learning >o too is thjs 
document, reporting the work coFnpleted. 

2. Reporting of performance scores in couise work 

Figure 3, from an Australian technical high school, offers a 
^jecirrten profile of Type 2 Spa«? 15 left in the slanted sectiO' of 

ERIC 



1 CQMPtiTE mO^T/TOSOMt MXIGCT 



%X9iBAMmT vmry idwt hi» mjk 

vchPoK wiil ^ WIT Aifftrmt* 

fismt ^ttm mam&n •teldft't ttrU tl» turf 

asmm uom simsics. Gomwsmmm am 

VKIWUXT im 

htm 3 sm^m AMD m imwtm mm 

All ftfl^M iP«« iayiwtf l^r r»-^fntiflf maA yso 
mftit%9$ t« M4. ttai mm wm it ftoor 

mJF A JQU»Mi. A!iD imiTS » CT msT Mr 
«lftl Ufc* to Mt 2 pMm^ tor tJm «srU di^l«r' 

i^cK^Mm wm T£Acm ai« imm smem^ pmtcvmt 

IM CiASS OTtYiTI£S AHP SUiK ISOSSUnf mmXAtS ID 

oar sacarate. mat fasaaA^ to GBm>l«ta all 
warift aa tim* !fM(t tan v»a yaw diasy aasriaa aa 





— fife- 


✓ 








✓ 




✓ 




✓ 




✓ 





S{nmxi far f«c«w« 

• Alt MtmMft g««}i for Term It Ei^Uai) Mira bmm Kmp^mM. 

• «lma A«|Qrovad »«r uritiag thia taw bacaaaa «)» ma^SSr Mitm^ 
;snrv«t&a»ta, «ad aaxta4 ^umlr ia ctaatr 

• ha» teen a iittle i^a Kitli aw i^rk and «Wf(t ttiai 9hM4 ma fiar 
aiary. If th9 taaiM ar« haartf il» HmkU tall sa. 

I ivtk iimmxA to 4:i»ci«aAii]« Marta'« i«ork i*it% rou at tlia param-taachar 
apecifijjt^ 

Figure 2: A work^^con^HMed profile stottnifint 



the table for insertirtg the particular sections of the course seen as 
relevant. There is also room (rK>t represented in the Figure) for 
extended commenl of a descriptive kind to summarize achie- 
vements in other sections, and on performamre in the course as 
a whole- Space is also allocated for detail of what wotH was done 
in the course arul how the judgments of quality were read^. 

This is perhaps the most common type of profile. Ticking a t^x 
and entering brief commentaries are easy and quick and popular. 
However, how adequate is a mutti-point continuum to express the 
full range of behaviour or attributes under each heading? That 





»ff PC»<T Tt> PARENTS 








FORM 




■ h^'< ■ T*^ I 














s 




\ 






























































kjik\ " > 4 ' ,{ • i Kk s A, 






















• ■ ^j. ,( . ^. ■ ■ M.' ■ ■» 




























}^»* 








• ■ f . 4 ;■ . - • f M ■ *f 
• A'* 




























, *, , ' -1'.'^^. • < ■ 
.fA,. M< »* -Htt 























































Figure 3: A grid-and-^omment course profile 



«m«}r» to be ttenonstrated. Nevertheless, it does represent an 
Improvement wer a statement such as 'Clottiing Construction l(; 



3. tn^iofe-course reporting 

Figure 4. from another Australian technical secondary school 
starKis as an illuslralton of a largo number of specimens wfjich 
attempt to report a whole year s worf< by a sludont in a single 
multi-faceted statement. The lines on the right, extended, arc for 
comment by subject and home-room teachers to amplify fhe 
ju^nenls indicated tjy t»cks in the twxes. 

Figure 5 narrows the time focus to four weeks, and expands 
simple subject-based assessments to broader, inter-subject 
assessments. This profile derives from a highly specific Youth 
Opportunity Pni^rammt) devised by ifie City and Guilds of 
London Institute, and has been used as a model in other places. 
Figure 5 is completed by shading m the relevant parts of each 
numbered. hori?ontai line". 



Queens 

Technical 
School 



HUMANfllf S 

MATHtMATK*.? _ 



HAtt 



mt 



>l W5j >' > I >i tf5. 

i i I i i i 

I M i i I i 




Figure 4: A grid-and-camment profile of a year's work 



PILOT SCHEME 

PROFILE REPORT - FOUR WEEKLY REVIEW 

This profile shows the levels which have been reached during the last four waeKs and the learnlna 

activities which have taken place. 

ATTAINMENTS IN BASIC ABILITIES 



10 
Ml 



fli 

< 

o 
o 



o 



< 

2 
< 

3 
i 
o 

u 

Ui 



WORKING wrrH 



Can coop«r«t« wHh oUmr s 
wh#n IffS 



Can wQfk mm omcf 

iTtm^iimf^ of group to 
acM«i^ camnHvi Alms 



WOflKINO WT>4 
THOSE « 
AUTMOftfTY 



C^ follow verM 
invtn^Hons far B^mi^ 
taaKs imd can porform 



iM(5ersTands own posttton 
and fesutts of on*n acftons 
wtthln a group 



fcikmr a of 
mbal fnstfuct^ons m\6 
CMXTi out 



C^ carry out a sqtIbs of 
fasKs 6f factlvely. 0i«tm 
minimum lnatructK>n8 



SELF AWARENESS 



TALKWG AND 
USTENIMG 



Is aiva^ 0 of o«m 
personality and a^fiiatton 



Can dotemiit^a own 
stra«^fta, woaknessM 
and prnferencat miXh 



Can hold conw»atjon» 
y0f\\h wfor^mates* fftCB>tt> 
f acfe Of t>y 'pnone Can 



Can MichMT and give 3»mpfe 
deDCfljHIons and 
0yplanat{<^9 



Has good basfc 
und9r!st^»ndJng of own 
»Uuation,pef9onamy and 

motivation 



1 tH<i^ U»a<) 



Is an ac!ty« decisive 
fy^mt»f of groi^. Helpa 
««d ancotiragofl otnera 



Inspires conf*danc« In 
ffros© In authc^ity and 
communicatas wall with 



Has a tftofou9f> 
»^<^fsf endmg of own 
personality and a^iiiiiaa 
and thfiif tmpucationa 



Can contmuntcafe 
af 1e*:f *veiy m\h a ft^ngs of 
pecpio in a vanety of 

$*!u35ions 



Can pr/^/9nt a ^oc»ca^ and 
effjKlJve n^gumeni Can 



An similarly for communlcatioa practical and numerical abilities, and 



lNFO«MATK3N 
SEEKJMG 



13 



EVAUIMTING 
P€SUtTS 



..varnriBfion wi,i, 
'Ce from tupervtnof 



With Quidanco, can cope 

Sfn.pta. rv^fyday 
problems 



Can aa^ftss own fusuHa 
With giivdance Ajiks for 
advice 



^n 



i lotion 



and gai^efj':^ .-i^nff^alfon 



Can cop« «f?fh complex 
Put ?flw!»r>e r'OfcUjms 



C-^n roix* ^f?h u^^ij^ua* 
pfDnH-nn<i by ad-vf^ng 



4^ 



Can o^^ei ^erisi?/ve ar.d 

P?H^P'e fi»Cir\g yfOtHenis 



mctep<indenf{y 



rout. na fast's } t-f^-^O.-rr iir^rH *#r^*1 ^ijr,*:-, ^^^^o^e^piO 



N/O — No opportunity to assess. 



Name Of Trainee 



D_ 



Nanr^e of Scheme YOP Community Playgroup 

Period covered by this review 1 Month 26 2 82 — 23 3.32 



Figure 5: Profile of performance on a whole course 



ERIC 



SKILLS 



«m» Sfm * ■• . .> «!■« 3»» 4 



«*« .•^'• '•h* -«.»'•«• 
»«»•♦».•.• <h , -1 



VISUAL UNDf RSTAI^DJlVa 

t »" • ' c » •»>'♦ % -ll » «,o 5, « • • ' ■ * ■ ■■ 

PHYSlCAl CO OROiMAl JUjv 



35i 



USE OF 



tH 



18 



SUB JECT/ACTIVrTV ASSESSMENT 


r i 1 I 

. ^ .... , , . ..^ 


1 . ^ 










-4~ 



.^.4. 



i- 4 



' At ithsctsr 



1 



■ ■ — - - - 



; 4 



Figure 6: Another profile of performance on s whoie 
course 



A fourth bpoctfne^n (F igure 6) comes from tnain tr^ 1077 by tho 
Scottish Counch fur Research tn Fducation. Crifonon statements 
about basic sk^Hs arid assessmonty of personal qualities are 
checkori usirig t)(ixof> ami rCh^cvements ifi mijividuai subject 
areas <Hf rerorciod usifiy a nurm-referenced four potr,! scale 



4. Student self-records continuously available 

A ie^^rninp manygemoMl sch<>fti^> u^^uiq neqotiation betw^rn 
students and teachtns rornfads <0f wofK anii fuMtie? fieqo 
tiafion of the a.ssessinHnt statf^^ u-nt^, w^^ich w^^^ fOr(>ni and t ppo^! 
that wofK, wtH proclucp quit(? i- ^if?rr»nt profJieB ^rom type? 
Except \\\ Figure 4. ob)f( '.'vtrs ffrimeh, qruls. I'Cks, qr^irtps arul 
text have t50 far teen roU^^y teacher dirocton even *n F ^urp 2 
only the contracts, not prof^lp report ftsnP jnvf^ivnd stuOpr^'.; 
d^roctlv. (f-igufO 5 had a lino assessitiq ■S^M' Avv.ifOfHJss . mv 
wondorB hovv that can bo cnfcuU^loa ^xm\ th(^ tiMchor s *;«do o\ thf^ 
deskh 

Thero are two famou?^ in^itances of tfiis <40ft of profik" Or^o 
(really a patr) tho Swindon Record of f^orsonal Achiovemr^nt 
(RPA). t^gun m 1970 m England, and its successor, tho Hocord 
of Personal Expenonco. Qualtiies and Oualificarions (RPf} 
begun in 1974, in Devon. The other is the Schools Sfxth Fonn 
Tertiary Entrance Cedtficato (STC). an alter native study slructurp 
io formal public oxafmnatK)r>s ofx^fatm^ in sofoe scho{.4s \n 
Victoria. Australia smce 1976 Tho RPA RPE pair invoivod each 
Student recording events, achievomer^is and exporjonce^s Wi1fi 
considerable flexibility Each entry had to tx? attested by an aduii 
by way of verification. The whole turned out to be an jdiosyncratic 
yet /)/gfft/y refiabfe and valid report on the y(>ars of s<:Hool dutjno 
which it was compdjd. 

ERIC 8G 



Iho STC syston) nia^ci^ self-foporting optional There are 
neqoiiafions of what js to be iearned w*th selection of objectives 
afid ronton!, there js also negotiation in framing and wordjnp the 
MvjM'Ht uf !he assessnu^nt A seH-assoHsment statement m the 
r;v .orri of achsevemt^nt \f-> porrpiflod and m somp schools, actively 

♦"■CiM/^'U]f*d 



5 SchooMong assessment portfolios; continuously 
available and updated 

Ivors [> B and 7 h^vf> been ^-ailod fnacri.^iniljat'v^'s l')3t 
^fyvC'i\o*^ th.=.» d*^p?h and hreadth {at-: well as the sheer huik) of the 
H'pnri being offCM^d An lfin*v Lnndof^ FducatfOn Authority 
srhprnn rjathor^ toCjfMher a porffoho of tO restilts in puOHc examt- 
nat'cns graded tests {^ni ciasf^roo^n fep.ts. Ov) conlfibut'ons 
UOHi tnarher;, \ pupJs and (V)i parents All fhis is testimony to 
iiiM difcr.ffOns arfiicvernent and pfoqrpss of the individuals r.on- 
cefopd .S!j<:fi a docu^Tient if pubi/shed. may weii a t/aiiscr.^- 
of a school s vvr^ol(> focord f>f ttr passdue thniuph its cunK;uium 
of a particular tudent. warts and all Two offcuts frotn surh a sub- 
?'lHnlM{ t)u*fK of r?iaienai are (iescrit>p(i as types ii and 7 



6, Pre-transition summaries of achievement; 
available when the student leaves school 

Air Students leave -.chool they might lake yvjth theni tht? f^ummary 
of the lull Type 5 ^x>r1foho profile, recording only the very latest 
land hence most i eliable and vahd) of thai vast mass of forrr^ative 
and summative r.tafemenfs The prof fk; wf II cov^rr \Uc whok> ranye 
of curnculurn exp£?nenv.es but m a shorter form 



A iMlattie ernnpkB the O^cloiii CerWIcalte of Educationa} 

Th^ P-<x>mpcment will take the ft>nn of a p^sonal mcorti. 
compll«j by Ifie simlent in consultation with a teacher, which 
draws on the formative exp^ences articulated by the student 
and teachers m all cu urn areas amJ also such expe- 
rienoBs tewmJ the form .uniculum. The Q^^^om^^nent will 
be a d^l^ statemen. ot what the student has achieved 
These achieven^nts will be defined by explicit critena. The 
student wWI have progress r)«x>gnised as rt occurs and will be 
atte to klentify teamtr^ objectives and negotiate pit>gress 
fwlhin tfw cumcuJwn. The E-cwnpcnent wfll record all external 
examination results and shows bow OCEA's three viewpoints, 
which are expressed in its three components, are each implicit 
in every subject. 




Figure 7: The P, G and E Components m the OCEA 



m t* £<^1n^ ftMltf fey W»f Jit Ccll*«r 9fHu mm mw lit KOCX C^ITHU^I) 

*m mm - .... ^rwKi tim- 



< i fit* iMftnf . / 

J ? 1 

'!r,'r.* '£*L ^'**Ut^l1.5ti#di»<1 fie o«ly fap'Bjgjfst ff^"f*iitr$^Jjfw't, t.9. 

It, f [9 ^» f 17 U^ f 

!ii * p tyj F ^, to 

♦ to f Ift » 

*■ ... .. . . _ . 

fo;-n ^ ^ , _ , . ^ ..... 

f ib ^ ^ ^ _ ^ ^ _ ^ ^ 



Figure 8: Student application for a scho(^ reference 
certificate 



Many people would not call )t a profile a. all However, the 
OCEA has supporters: 

Most 'profiles' concern themselves with the producis of edu- 
cation - reoorts. qualifications and certificates - ralher than 
educational processes The P-component however is about 
both process and product. 

Another example comes from a school m the southern hemi- 
sphere. This school issues a reference ct^rtfficate which is in 
effect a statement, m siimmany form, of teHch{?r perceptions 
Figure 8 is the student s application. Fu jure 9 is the report filled in 
by each of the three staff members who Know you well' The exi- 
stence of such un-modera!ed inventories of pupii 'characte- 
ristics, in a cc>mputentf>eci world ot perpi-'tuHl storaqe th alam^nq. 
to say the lerist. 



7. Vacationaily*orented achievement; reported when 
the student leaves school 

A Type 6 summary, from a portfolio perhaps when u providing 
only information which is relevant to the? ntu^lerns intended 
vocation or likely vocations, becomes Type / 

I cannot find an exemplar. When ) t)ei^an fu look afourni my 
large collection of profiles from five countne^r I could f^ot fir>d one 
which was strictly vocationally onented Tl>e closest I came was 
Figure 3, already used for Typo 2 this, at least told the Rag Trade 
what Student X had done in a Clothinp Construction course Oiit 
It didn't (and was not intended to) communicate details of cur 
hculum. nor was its assessment fitted to the interests of the brond 
range of potential employers which Student X might have had n 
mind on leaving school Admittedly some courses are directly 
vocational, such as the one which yielded Figure 6. taut vocation* 
seems to be in the name of the Scfiome. not the profile nor the 
capacities it represents 

ERIC 



1 



\ 4p«i ' t V if» wft 1 t'f) y^ii * ''v»*« ffif 4 ^ 



w. > 



1 

L 1 J 



1 

4 

i. W 



:.4 



■ 4 s '.f'' 
» 4 «. f*r.M 



Figure 9: Bef^x's response sheet for a school 
reference certificate 



' tlnctoilt^dfy stich proves »dat But wKere? In the ffias of 
career ^tktenco officers pm&ai^, tt» resuits student taking 
teste 'career guktonc» invenbv^'. The profiles are 
unnrtfirted, theretore, to the mainsfcream of what the ^iKJwits have 
stw^ and «^t has teen assessed »n the sdrooi. Uke public 
ffliams. these guidance Inventories are often detached from the 
rratfns»r®6Hn of school life. 



Bght ^tures, generally agreed, which make up 
a 'goocT profile 



1. A pit^le sumnwria^ performance on a seiHience 
of insbiiction. 

It may inadentalJy offer diagnostic insights, but this is 
not its prtn^ f unj!ion. 

SL Mt must r^rt both academic and non>academlc 
outcomea or achievements. 

Th^ feature Is often questionable, and questioned by 
those who are shy of reporting 'effort', 'achievement' 
arrcl other such atUibutes. 

3. It must have more than one point. 

For 'point", one might read 'elemenf. 'trait". 'sKiir. 
'objective to be reported', or some such alternative. 

4. The 'points' are neither valid or reliable if added, or 
otherwfee put together. 

And this must be clearty indicated to the user. 

5. No letters or numbers are reported without an 
accompanying key. 

The key will, ideally, be directly related to the content and 
process objectives of the learning. 

6. The objectives and format should kieally be rec- 
ognised and used throughout the education 
system; tlve document should be fi^rt of a fo»mal 
system of certification. 

7. The complexity aimed at should shape the layout, 
etc., not just some administrative conventence. 

8. The complexity aimed at should not compromise 
the reliability oi the profile as a whole. 



Warnings 

Warnings atx)ut the nature, content and limitations of 
profiles In geiwrai: 

A simple checklist 

For students, parents, prospective employers, guidaru:© officers, 
university selection panels, and other tertiary educators. 
D(»8 the profiie: 

O name in full the stwient? 

□ cany a signature, stamp of authority, and dale? 



□ recofd the yewts) of schooBng to which it refers 
{1 record the (»un»(s) to whk^ it refers? 

□ give a guide or key as to the meaning of any numbers {marks) 
or letters <grad^) us^? 

1 2 record the work that was done, in a way that any skills or achie- 
vements referred to can be related to somethir^ concrete or 
practical? 

n indk:atewhotbeauthorsarB(student/te6«^»rfs)), araJwhothe 
nroderators are (other students, teachers, princifwi. etc.)? 



Another checklist 

This one is f<x all of the atxjve. plus «iucat»nal admintetrators, 
theoreticians, researchers, and politidans of any persuasion. 
Does the profile 

[ 1 represent an intrusitwi on tf« personal privacy of the student? 

□ remain an unnxxJerated slaten^nt t)y one or a few person(s)? 

□ refer to a fommtlve feed-back assessment, when it is to be 

as u summative. marks OTly, statement? 

[ . ] record a iatost judgement which is more than two years okl? 

1 i record 'effort' and attitude" without ctelails of how the^ 
matters wrere dtsplayal in ttw classroom? 

If it does any of these, discard ttxjse sections of it. and wew the 
rest with suspicion. 



Some important conskierations 

For anyone who designs, instilutes, evaluates or is an a.jdience 
for a profile. 

How does the profile match up against the following opinions? 

t Profftes are progress reports, but the , e must a pla(» in any 
summative (marks only] profile tor recognition of changes in 
the student since the last formative {feed-back) a^essments 
were issued; 

2 Profiles can report continuous assessment, but this must not 
be confused with, or replace totally, provision for a summative 
ovalualton. Rem- .iber that an external unmoderated exami- 
nation is not on its own an adequate summative assessment. 

3 A profile should be more than just 'the disaggregated refwrtjng 
of examjnatkjn results', 

4 Profiles might purport to record mastery' but there are some 
fundamental implausibilities in such a cortcept - it is not gene- 
ralisatrie across persons; not continuously distributed across 
skills within one person; not generalisable from a base ability 
to a set of its component sub-fibiiities. What then i£ mastery" 
rf it is recorded in a profile? 



ERIC 



Profiles and Schools 

Profile construction is an issue l>elor^fng firmly to schools. Here 
is an unedited and genuine lett^, r^inted with permission, but 
anonymous by request, from an AnttfXKlean parent to a prinapat 
about the institution of a continous assessment arKl profile 



ii^K>rtif^ system in his rons' school. It reveals m short form but 
V9Ah, I believe, superb dartty all the issues whicti might well 
disturb us. 



DenrJacK 

Third and Fourtti-Fomi Reports 

Those are a few queries I have, and comimnts that I'd make off the tc^^ 
of my head, aixmt the new form of n^x»tr^ : 

(>) Theie seems to be a debrnte move awuy from tangjtJie verifiable 
reft'renls to more vague and subiective ofies AHhoijgh thr 
impression created ns cw of comprehensive infonT,ation {nmny 
report forms arid many headings) whldi ^ ^^arenily quite precis* 
(S^potrtt scale of grades). 1 doulJt ff parents caf\ lake H all in and 
whether, in fad:, the assessment is really as comprehensive and 
precise &s it appears when the reports are sut^ected to close exami- 
nation. 

{u) M pupite* progress is measured *in reiatK>n to their own ab?litfes\ who 
assessed these abiimes? How mw they asse^s^ (what measures, 
^tes, systemntic observation etc. K pailtculady ttte altitiKies. effort, 
participation aspects which pre heavily weights in some subject 
areas to the excfusion cc^nitive outcomes? Are the assessments 
of proven vaifdity and rehability? When were the abilitjes assessed? 
{i.e . How recent is the inlcHTnatiOT, particiilarJy for 3rd Fonns?) If 
these quesJions carwt be answered salJsfactohJy for al^ fXipite on ai/ 
D;tefia appearing in the refXJtt. then what tolkwvs in terms of the 
assignment of grades js likely to be highly inaccurate arKf mts- 
leadirM}. In other wc^ds. if the teacher's assessment of abJlit«>s is 
inaccurate (either too h»gh or too tow) then damage will be done 
(pupils pressured to attain the unattainabte: pupils achreving A 
yrades without effort) and assessments will be awry t^cause the 
baseline eslknale was wrong How confident can 1 be as a parent 
that the teact'iej^ in the various subject areas have accurately 
assessed my child's abilities, not only m cognitive areas, but in 
attentive areas 3s well? 

(lii) Presumably, a tOricter havifiq assesMHi a pupil's abilities will have 
some cxpectiition about the progress such a child should make over 
the mcHiths covereci by the report, i e . a child of n ability should make 
progress over X months (other things being equal The teacher is 
probabfy using internalised r>orms bas^ on previous expenence 
Bui, fhese coukJ vary markedly trcKn teacher to teacher and such 
assessment woufa t>e extremely ditttcuH fo^ ^trnxperiunced teachers 
with limited expt?sure to representative samples d\ 3rd and 4tf^ 
tormers and particularly suspect in estimating the progress of very 
slow nr very fast learners. What moderation m there of 'teacher 
expoctartcy m assessing pupil progress'? What tioes Teacher A 
expert in contrast to Teacher B both of whom ;ire sup /osed to tv> 
assessing the same thing, but who are using dff^tven! ct'tr*'- unu-i^ 
nalistfd. rn)t stated unexaminaWe) and who have had dtffvrpnt 
teaching experiences Is the Science teacher s fH)t an of excellent 
fK<H|ress for a bright cooperative Ui6 t\^e same that o? tN? te^H her 
of Oorriiiin'^ Is the A tor Sociat Studies c-^iU5valt?nt to a C ki^ Miithe- 



matics'? In <^her words, assessment ts now entirely teacher-specific 

jiv) Ttie NewsJetter seems to ir^jicata triat progress is the result of 'hard 
«vurk 0f\ the part of the pupiK of building on establish^ skills ar^ 
developing weak areas The onus apjjears to he entirely on the pup«l 
to make process in all facets of hm scfiooi life. Theic is arrother com- 
port to the tormula* of course* Suf^K^ we fiave a pupil of gooci 
abiHty {however assessed) wfw is n akirq mediocre or poor 
progress. Should the parent not ask \hB teacher why this is so? What 
responsJteHty has the ftaar^r in ensuring that progress is a)mmerh 
surate with ability'? If many pupils are making poor progress then 
mnyt^ it's the teacher who sfioufd be looked at? Ptt&sibly his 
teaohmg is ineffictent aruf/or ineffective, oi pupiis are inefficiently 
motivated and or irtterested to achieve at an optimum level in his 
class. Pfesumat>ly, if ^//tescfters have match^ their teaching styles 
with tf)e pupils learnir^g styles, pitched their lessons at the right level, 
chosen appropnate arK! interesting material etc.. etc , and provKled 
excelleni examples as adult mcxiels in terms of attitudeis, etc., then 
aff pupils shouW t>e stwwn to m -ike progress over tirrie How likely is 
this^ You and I know it s extremely unlikely, even if we discount that 
learning goes in frts and starts and that many of the outcomes are 
long term, parttcularty in trying to change attit^ides and entrenched 
behaviours which may be deemed undesirable. So. what will a 
parent ba anticipating in the second report? What should he expect 
if the teachers are cioing their job? 

{v) Tr^ aims stated at the top of the report forms are vague, woolly, 
typical of the kind which appear m most of our syllabuses They are, 
in most cases long-term ultimate goals which will not be achieved in 
ihe time span covered by the reports Do you think !t>ey add nnything 
for parents'^ If the teachers consider that aims should be mcUided. 
why f^ot have a statement ot what was attempted s^waffculiy tor thti 
Ihree tenns covered by the report and tx)w well each pupil achieved 
the short-term objectives'? 

I couiU ixjntinue a! some length Jack, but I don t ntend to. aittiough I d be 
happy to elaborate it you wjsh While, ns a parent i jpplaud the school s 
etlorts to provide me with a comprehensive picture of my children s 
progress i don't think Im much the wi.ser. purtirularly m fc?gard to 
academic achievement And as an ex-toacher I'm scoptcal atx>ut so 
many batdiy-sl?*eri cntena which I Know iite ei^tf t^inoly d'^icul! to asst?ss 
validly arid reliably FVrhaps 1 ti be pleasantly Mirprised when i laik to 
subject teacners next week 

Kmd rf^q:uns 



Copying Permitted 

txnter !e;«chi'>'j • 



ERIC 






Non-Verbal Tests in Schools 



ERLC 




9u 



Non-Verbal Tests in Schools 



By Cedric Croft 
NZCER 



Nob* Veibal Tests 

The kTm 'non-vcrh?^ \t*^V used dcscrihi* a range 4»i 
paper-and-pcncil lois designed lo lap a seleciion ot i^^niiive 
proccfiscs that are unlikely involve verl^! language. ITiis 
does noi mean thai verba! insiruciions and verbal strategies 
fmvc been entirely eliminated for all those who take these 
le^s; it simply means that no words are included in the tests, 
the lest coniem is of a non-verl^l naiun:, and the respi^nses to 
this content are unlikely to invohx* language. 

Examples are the SfunJarJ f^^fpYisizr Maniirs. Ji-fikim 
S'tmA'trlHil. AChH Jumar XofhVirbai the HWT Culuar 
Fair Te$!$ and the ShliR SnftA'trbul 7V>7.v All of these lesiv 
attempt to measure general mtelk%"tual skills ol u non-verbal 
nature by uiili/jng shapes, patterns, diagrams and sequences. 
iTie use in schcM.ls of lesis such as these is the locus o! lhI^ arti- 
cle. 

Fetfonnaoce and Apparatus Tests 

There is a wh^ile range iif performance or apparatus tiM^ 
widely used in psychoUigical assessment which in\iil\e 
^-erbal answers. The best known lests containing pvrhn 
mance items arc ihc WWhshr hihih^. mr Siulr lorilluiihm 
jRi^*i.vrii u>bject assembly, block design the Hx visx ilSiunforJ 
Html Finrn IM K^ad threading, paper cuning and the 
/?mi>A Ahliry Siulrs block design, roiaiiim o! leUer-hkc 
forms*. The conienf ol pvTlormancc or apparatus lests is 
cally non-vcrbaL but the one lo one adinmisiraiion ol these 
tests* is a highly verba) proce*^^, li is the ci>ncreic apparaius 
that leads U> pcrh^rnuncc u>ls King ilassilkd as a calcgi»rs 
that is separate and quiic disfinc! Iruni non Nvrhat paper and 
pencil measures 

Non-^Language Tests 

Some writers make a \'alid dssinunnn hetuccn Uic triK- m^n- 
verbal test, and i^thers that can Iv lei nicd nt^n language, nop 
reading or palonal icsis. This distiowiinn is useiul. a^ nm) 
language ti^sts require no verbal langu.ige ai all. anU can iv 
used for lorcign-sjvakmg* dcat and lihiciak' subieiis It uui 
he misleading to call this S4>r! icsi UiHi verbal, since the 
term should imlv applv u» the fesi 4«^?;/»7/f and niM the subicci's 
hvhinmmr. Tests of verbal iumprehcnsumand vtkabuhuA latt 
be administered through the usv ni pkitM lakonteiu, iheuui 
come K*ing thai underlvnig verbal vibiltties ate nna^uH J 
by tests with a content devind <»! vi^Tbal material Nno Kui 
guage tests are rarelv used outside a eross-uuhiual scHing. be 
cause in other settmgs esammers and sobjecrs usualK fia\e 



ERIC 



S4H11C km>wk'dge of a common language. The Army Heia 
Ti'siSn previously u^'d in large scak' inducti^>n and UHTUtting 
in »he United Stales, ihe range of tests used for sekctiim in 
South African industrx and the (jNirmUimi 7'in/, are promi- 
nent examples of non-language tests. 

The Characteristics of Non- Verbal Tests 

Non-verbal tests: 
I i t have a ni>n-verbul cimtent; 
1 ii • aa* in papcr-and-pencil format ; 
i iii I have no apparatus; 
i iv are suitable for gn>up administration; 
{ v do not ina>rpt!rate writing resptinses; 
{ vi ^ use oral language m adminisiraiion; 
ivit^ tap iognitive processes unlikely \o involve verbal lan- 
guage. 

Why are Non-Verbal Tests Used? 

ITie short answer is thai non-verbal lests are seen as meiisures 
of ^ability' unconstrained by language , and because of this, it 
WiuUd seem that they can be used ti) measure cognitive func- 
tioning without being, as most tests are, dependent on lan- 
guage achievemeni. Non-verbal tests are thought to be of 
nmst value tor testing children who are non native speakers of 
}:nglish» or childa'n whose measured verbal attainment is 
fairly minimal. But this takes a number of points for granted. 
\\*hat are the major assumptions underlyitig the uses of non- 
verbal tests in schiH>l - ? { ^in these assumptions, and hence the 
uses i)i the tests. l>e jusiitiedV 

Assutfipfum I: Xofi-rirKii Jrsrs hip u >i / at ihmkmii <ikith 
hash' ii* all wu lUxtuai Uoh lummii. so liny air mutsiHt s of 

It ts wrong \o assume that non-verbal tests measure the siime 
cognitive funclnuisas verbal tests, no matter htm sjniilar they 
appear w K*. Spatial jnah^gics are nti?re than a n<ni \erbal 
version ol \ erKil aiulugies. The liu in of the rehitionships dil- 
let . the elements that lead in the }\Tcepiton ot pilots ot suni- 
lants dtllcr and the level of thtnkjng Used lo deduce the re~ 
lalinnship is set at JUIeretU levels 

Tests itke \\w .SfufudjfJ l^fifi:u Miffrht s luid »Mher simi- 
lar nnn \erbal tests, have been designed ti» measure a br*nid 
selectjon ot reasonni^ tasks unJ absinut Ci^nicptuaii/ation. 
but la\.ti»r analviK studies have n^tcatid that sepatatc non- 
\ erb.il laett^rs .ire the grealcsi e^Miinbutors \v iheset^res. This 
sijguesis that the mm verbal ahihnes iving sampled bv these 
lesiK aiv largeh Ji^tuikt troin the iTeneral \erKil-edik\itional 
g,v:ed factor benig nuMsured bv verbal tests However, it is 
ptobablv inisieaihi])^' ti> think oi \erbal and nun verbal 
abilities as King eiUJteK distiiivi vribal and t^ui-veibal 
abihttcs ,jre asjvcls t»I the bnuder nump i»r skills m\\ conv 
innniv rclerred to as Hhi^Kisiti aptitude Inrmeth intelh 
ueiue'. geni'jal nueliii!et>ie*. iieneial abilitv". 'mental abil- 
ttv'. 'iienetMl mentiil uhiiio' ii l^ proKthK ni^fe feahsOi \o 
think oi these .^bjltiie*- as luo biojd divisums i.| iuitnan niiel- 
k\\ LHinposed t»t J inniibef ol ideniiiuhle skills, wnh s<tme 
genera! elctnents jn eiUtmun] 

n ' 
^1 



Studies of the relatumshif^ tHrtwcen nwasures of ^hmA 
^tcvcment, schoia^siic apiiiiHte tests and non-vtrtel 
also shed mmc light on the rdaiionNhtps between non-verbal 
tests, and general intelligenav 1Vpical!>% achiewmcnt !e?ifs 
sampling predominanily verhil skills (n.3ding ci^mpa^hcn- 
sion. \'ocabiilar\%siudy skills, spelling, writing skills^ ei>rre- 
laie more highly with measurtn* of general mielligenee i M) ) 
than wth non-wrba! tests (-60), Maih^maties involving 
problem solving alsi> abates moa highly verbal than non- 
verbal tests, but aspeets iM mathematies stressing spatial 
skills* i.e. gei?metr\\ show a more positi\\" a*laiionship with 
min-wrha! tests. 

Assumptfi>n 1 cannot be supported. Non-verbal tests are not 
measua^s of general menial abiliiy , Non-verbal tests are mea- 
sua\s ol'the broad domain i>f non-verbal abiliiies, and cannot 
Kr used as valid measua^s of the inielleciua! skills asiwuiated 
with most of the highly verba! la.sks commonly encountered 
in much classnnim learning. 

Assumf^nm 2: Xim<\rbal tests are more vaJU measures of 
the h'hMl fHUemial of (he Imv-ai hici er than verbal tests. 

Non-verbal tests are mi^si certainly valid measures of non- 
verbal abilities, bui this class of abilities is litile uiili/ed in 
tm^i schm>l learning. Non-verbal fesis di> w\ lack validilx /^r 
st\ but their validity is j^uspect when ihey are used to predici 
possible future achievement m verbal areas of the sk hm)l cur- 
riculum. \X*hen th*' tests are used lo paxiict verbal learning- 
there is a mismaich Kiween the skills measuad by ihc lesi 
and the abilitii^s thai underlie the learning. ITiis cannoi en- 
hance test validity, as the tests are measuring something dilte- 
a^nt fmm the skills ind abiil!K^ underlying the intended 
achievement. 

I here is a wide^;^read beliel iha! pin^r readers who score 
well tm a non-verbal lest are likeK io succeed in a remedial 
reading courn* as ihey have dcmnnsirated a pt>fen!ial prcvi- 
ouslv un(appt*d b\ verhil measures. Thev may i»r ciiurse 
show verN' graiilying prngrv^n, hm il is unlikeh ifiai such 
pn^grcis IS due lo the prevc^nce iit llie ahiliiies measured b\ a 
non-verbal lest. Reading is nbvioush a highl> verbal sv\ iy\ 
skills and ihe abilines underlym^ ihiv pnvess aw siniHar 
those measureJ bv verbal lesis. There is litile basis for Oie be- 
lief Ihai m»n-\erbal lesis are sai!s|ac!t^r> prediclt>rs tii readrnj: 
achievemeuK panivulurK ul wumprehensinn skills, 

Where nun verbal lesi {vrfnrmanee is \er\ mikh bitter 
than verbal test performance n is renipiin^ iu jn^er Uiai ihe 
non-verbal resi js ihe mnre valid measure oi imJerlvm^ 
abilities, and that result- trom ihe verba! tes| represeni a loiin 
of imdcrachieveineni li is then surmi/eJ that . ^iu-n a dille- 
reni se! nl en\ iri>nmenial circifmsunue*' . piTh^rniance on a 
verbal lesi nia\ ha\e !ven nvaier in the resuiis nt the ni»p-^ er- 
bal lesi. Ibmever. )f ytui want \n esunnue an indi-Kiual's pre 
sen! funciionmp and the IikeiihuHd n( thejr progress m read 
ing in the near hnure. a verbal test is bi-si . b will lap present 
accomplishnu-nis m the skills that underlie reading. The 
hypoihi^sjs advan%.ed uas len^ptme, and widelv ilitfennfj 
MTores mav su^^^est a need fi^r Innu term mferveniiini. but for 
the purj^>ses uf planTiin^ jmmediaic shtu! tern) nweds. 
verbal tesi*^ pro\i Jc the mt>re vahd inli^rmaiinn 

There is one UKpi^rtani exvcpiuui »hjs i:eneral coiklu 



skm. In the eas^^ of students wl^ise Knglish is limited, there 
can iustineatton for using non-verbal tests. Verbal tests 
A ill only be \^alid if the subject has had considerable expi^ure 
to, and experience ol\ the language of ihe tests. If there wtis a 
netnl to undenake an assessment of a nxently arriixni child 
from Kurope. Asia, or the Paeific, a child with Utile experi- 
ence of Hnglish, a mm-vert^i test could pve some very gen- 
eral notion ef broad intelk^rtual status. A verbal test in En- 
glish would have little or no ^^lidity. In a situaiiim such as 
this, it would be preferable to ha\^ my assesionents made by 
an experienced psychoU^ist whi> would have access to a 
range ot valid tests. 

Assumptii^n 2 cannot be justifK'd: Pianicularly in the shtm 
lerm, and prt>vided the indhidiial has knowledge and experi- 
ence of ib^ language medium being Ufwi. wrhA tests are bet- 
ter prediett^rs of most schcH>l aehie\ en^t than existing non- 
verbal measures. Significantly better non-verbal sa^nes may 
indicate cases of verba! undemchievement, but the type of in- 
terwntion requia*d is beyond the rescmrses of mi^st schools^ 
and it is hkely that the optimum stage for learning such skills 
is well f^st. 

If mm-verbal tests are to be used successfully in the way 
many teachers want, new with appmpriate validating 
criteria must be ct^nstructed. 

Asstmipium 3: Wm-ZHrhal tests are eulture frce, 

Fhere is a large biniy of evidence to suggest that non-verbal 
and f^rformance tests may be more culturally biased than lan- 
guage tests. <x>}e and Ilunier f«und the ^1S(^ Performance 
Scale to be as difticuh as the V erbal Scale for a gn^up of Negro 
children m the I nhed States, despite the apf^rent cultural 
bias in many of the vtKabulars , information and comprehen- 
sion items, comprising the WISt: Verbal Scale. Higginsand 
Si vers found thai, \s hen the RevistiiStufiffffd Hinct Form LU 
and the (A^hmn d Ptvj^e^siir Maitices were conifnired for a 
matched group oi 7 9 year old Negria^s and ('^ucasians, 
there were no signilieant diOerena's in the Binet scores, but 
on iht- ( '.olourui Ihuiin'sm ^' Matties the Negn> group did sig- 
nificanilv aorsi \ ernon m 1%5 reported that Jamaican boys 
scored KMter tm eonveniional verbal intelligence and achieve- 
menl lesis. despite their linguistic disadvantage, than they 
did <in non-verbal tests that ap{HMred to K» a *purer* measure 
of general nienfai ahiluy. 

As non-verbal tests are app;nently based on a white mid- 
dle class eonct'piion of logical ihmking\ shi^uld ihey be re- 
garded as euhure free^" C Aihen has suggested that thea- are two 
basic cognitive stvles. analvtie and relational, h>th regarded 
as mdependeni i)f uenerai mental abihiv, able lo dclmc*d 
unhoiH retiTencc io speeafu content but to mmuc estent at 
least. ml1uenct\1 bv spvuil and culiura! factors, ll (!iihen is 
right and the analvtte cognitive sivle is implicit m non-v erbal 
tesis developed by white middle-class psveht^logisfs, the ap- 
propriateness ot these tests tor cultural groups thai operate 
with a relational Ci)^nitive stvle. must K- questioned 

S'erni>n \\d^ nugiievfed ih.ii the ^r<nip iil skills we refer as 
mietlieeiue. t^r L'eneia! menial abilifv. is b\n\nd up vvjih e4>n 
vi-r^'cni piohletn si»lvmt!. pernisfcnee. tnuni*»*e and effi- 
uencv ll IS the ivpe »M abihfv well adapted u» scientifie 
analysis, control and exploration ol the environment, large- 



scak aiHl long term planning and earning cmt materialiHiic 
obi^tives. This has led lo gnm ih *if complex M>ciaJ tnsiiui- 
tkms (nations* armies, muhi-nattimal companies, schcH)] n\ 
ltm») but has Nxn Ic^s successful in pronnning solutions to 
group rivalries, or luirmoniouK personal adjusimcnu than 
skills which \%ould be called inielligence' by Mime cultures 
we regard as mon? primiiive. If V'erm>n*s suggestions are cor- 
rects and ciHiceprions of inteiligenee diOer fn^m culture to 
culture, there mmt be liille prospect of K*ing able li> use test** 
from one cuhure, as valid measures of the trait ot inteiiigence. 
within another culiun?. 

Maji>r writers in the field of psychological testing aga-e that 
there ?s no such thing as a 'culture fair' or \uhure free* test, 
especially since there is no universal culture that test items 
can validly measuav 'Culture fairness' is not an eithcr-or at- 
tribute but niiher a number of dimensions along which vari- 
ous aspects of tests can range« and so is a matter of degree. 

Ai^umption 3 cannot be justified. No test is culture free. 

Do Noii«*Veri>al Tests Measure Intelligence? 

There is no simple, unequinnral answer to this quesiiim as it 
stands, as any conclusions depend ver>' much on yi>ur defini- 
tion of intelligence. 

If intelligence is regarded as a broad grouping iif cognitive 
skills manifested by the ability to see a^lationships, deduce 
similariiieS:, solve problems, predict consequences, reasini 
lexically and generally manipulate a variety of symbt>ls and 
the ideas they aTf^*sem* non-verbal tests will be reci?gni/ed 
as isampling Sime of these skills. (!imsequently, non-verbal 
tests will be viewed as a measure ol s*mie aspects of intelli- 
gence. 

How do these tests a*late lo schm^l learning? In the 
broadest sense* schiK^l learning is a rcNult t^t in<erac!i<ni be- 
tween the intelligence of the learner and the schiH^l cur- 
riculum. If we call the aspects of intelligence a.ssiK ia!t\i mt>st 
closely with schiK^l learning scholastic apiit ude\ we nnisi a <k 
whether mm- verbal tc*sts are also measures iif schoListK ap* 
titude? It is unlikely that non-verbal abilities are a pa^domin- 
ant aspect of the range of nkifls and abilities a*ferred in as 
'scholastic aptitude' because schiH)l learning \s prcdt>,nin- 
antly verbal. 

To return U> the qutsiion dn non verbal le**!s measure m- 
telligence? In the writer's view, non-verbal lesis measure in 
the bnwd domain of inielligencc, but when are ihinkiU}* oi 
sch4H)l learning He are mosi ci>ncerned with aspecis ol inielh 
gence that can be caiegori/ed as 'scholasiic aptitude*. W'hfle 
non-verbal tests are not comprehensive measures ol scholas- 
tic aptitude, thev cannot be* disa^garded eniireh Mriicijlai iv 
if it is accepted that non-verbal tests measure s<ii ,i aspects ol 
intelligence. 

What are the Legitimate Uses of Non- Verbal 
Tests? 

Within the schiH)! context, non- verbal IcnIs are nii^s! tisetui m 
gukiance. ITiev ean be an aid m deiernuning the range and 



ERIC 



strength of an individual's cognitive abilities, as a first step in 
caa*er planning. Soi esery student whi) seeks guidance abi^ui 
future careers should administered a mm-vcrbal test, 
1'hese tests should be amcmg the resouavs available lo a 
Ci)unsellor, akmg wuh interest inventories, tests of specific 
abilities (e.g., conipuler paigratnmer aptitude, clerical and 
office skills* mechanical aptitude ^ tests i)f schi^lastic aptiiude^ 
measures of attitudes anil personal adjustment. Non-M?rba! 
tests an: no mnc. or nt> less imptmani, than these other 
categories ot tests. The key lo their utility is kiuming when 
the> will pwvide \ alid measurements and when they can K* 
usc%l with profit, 

In the ease i>f a student with nu-dioca* xerbal achievement 
and significantly higher non-verbal test scores, a case could 
be made fiir this student to follow a secondare or tertiar>' 
course that utili^ces non-verbal sta'ngths, i e. • technical draw- 
ings practical engineering or building. The assiKiated dtftt- 
culty is that although ctiurses such as these utilize non-verbal 
skills* much of the ass4iciaic*d instruction is undenaken by 
verbal means. 

When wu cannot uh* a verbal test, for example your pupil 
has anrently arrived from overseas and \ ou ha\ e no tests in his 
or her language, a nim- verbal test may give you an appro>d- 
mate measure of intellectual status and pi^ssible achievement 
in the short term. If would be unwise to attach toti much 
weight \v these results, but in this situati4>n non-verbal tests 
do enable a preliminan' assessenicnt of non-linglish sjvaking 
students Ui prwec^J. 

These tests, like any other, provide sfvcific information to 
he used in conjunction with infi^rmatiim from a variety of 
S4>urces. Pan idvd m>n-verbal test scores are acceptt*d as mea- 
sures of non-verbal skills. the> ha\ e a role to pliiy. If they are 
thought of as a measure of a bniad general abijiiy that under- 
lies schoi^l learning, the> will hv less useful. If ihe\* are re- 
garded as being predietive of highh verbal scho4)| learnings 
thcs will he downright misleading. 



Notes 

!;\tJ4'iKi' Jh:u non-Vi-rlMi li'^^K .lU- no! iijlniu* Ir^v wm K' luunJ in 
\,nk\ S. anJ llunu'f- S\ MUiuni .\iu'.h.sts WlSt' Si4ia*s 

pp 4fi- 4f,s. 

I'ln viisv nNS{i»n »>1 u In nun x 4 i H tl U'^fv .tu' nn] lijhuft' lu'r -oc 
Iv^ts **t InU'HjvH'nti' ' \f>tifL.;t^ \nU'ji^j\^L*i'f^. \ **! '1 pp S 

^^n* h'w/^' /^\, //.'/- '^'/^r pp ^^h, Sipi 

Further trading 

s.itikr K?otn; I" w .... Plul^nii-lphia: 

W V s.uiJhirr^ h?' 4 

nnllan- J*^^f> 



n 



ERIC 94 



Does Intelligence Eqyal 
Learning Ability? 



Bv ]o lenkinwv!}, 
/\CLK 



Intniducfion 

\vhatt-vrrMibu\ 1 H'snj^, t,3n^',ht . rhf'n^ will Iv vvidc fange 
i>t iiuiividu.r! dittcn'nrr^- in sfiuirn{s' k'<m\jT^^; H^uw Will 
U'jni mnn^ ijiiKklv thv^n oihrrv, .^nit* will u'tain nu>a% 
VVhcthr? h\u7iHig nu^dsurt-U bs i^>suj?*H*Mi ^^ihievement 
tt^st>, hv ihv nunilvj i»t units i'oniploft .1 bv tnwv taken to 
pmi:;n*ss thinvjjih Hi jm>>;j.ini!ni\ or bv tho amount of 

tiniHrr w'hM the sub|ed . but Mniu'! imr-. •vur}^ns*\'H Oxcur an J 
I'hik} who w.is Ihou^^.h! tn W ^otinnvhat v!uj! wHI Muidrnly 

.Some Definitiuns 

.Htnb;i^f*s xyi o {Hmviii as .in- ht^^'jM -^nd wk j.ejit R-^ther^ 
thr\ ,\n' Korhvyis whuh the psVihi^U^v^M w-i^ \o explain 
p*rfpfn!.uuf' o!i t;5N.ks lii'nuvuiin^; jnk iiri Ui.U skills Hut 
fherr th(' -ri^l.^n^;' t-j-.f- Fv; sfa:l Jri l;^»:k M -.oine 

:);) 



individual's intdUgence b* giviiig an intdligence ttfst 
under lUandandiscd conditions. The test may use diffeient 
types erf questions, for example, rocabulaiy or number 
rea^pcming, to sample various ixi^aitive skills at oni' p^ir- 
ticular time> Hie individui.d'ti level of intriiigenre is inter 
pjneted by wodcing out his or her deviation or variation on 
the tes; tvom other individual?^ who an* comparable in age 
or grade. But the test?; do not tell ns how the individual 
Ciime to learn or ai-qutm the skjib sampled 

IntelligeuiV tests an» i>tten ealkHi tests ot leainnig ability 
because the\' an:^ n>ost commonh' designed in ihstinpush 
between good and }>i>or leamets, espt*cially m si'hool 
learning. But they do not nt^essaniy pt>int to the underly 
ing diffepc^nres betwtvn students which can hv seen when 
they tackle learning tasks. 1 iaditji>nal intelligciiee tests an' 
simply a systematic , attfiough verv usehjl, way i^t compar- 
ing indhiduals' perionnances m order fo makt' pn*d!i1u>ns 
afc^ut their Hkelv eltioencN m other intellectual tasks 

Learnings on l\w >fher hand, is a concept used to refer lo 
changes in a fXMson's pMfcMmant'e dunng the lOtn^e of 
practice, apart iwm iho^v changes which an* due to ex 
traineouis fadors such as maturation, fatigue, ihanges m 
motivation, and so on. in principle laws of learning 
could be derived <nun i»bser\ ations nnd exfvnments on a 
single f^erson studied over a {.period of tir^ie. 

In measuring intelligeme we an* niteresttxi in differ 
ences betwi*en indivjduafs, %vhen\*isin measuringleanung 
wean? inten^sted m diffeivnce-. withm individuals before 
and er pracliee or nistrutiunv 

Can we then define inteihgence as the n^sull or prtKluct 
of learning? ITus wouid mean that a pemnVs \Q, delined as 
the ratio of menial age to ehn>n(^lo>;ual tige, woulo he an 
indcxof the rate of learning or information acquisition — il 
would show h<nv much an individual has learned in a 
given time. Hut some tonns {^{ intelligent b'/haviour st«t*m 
to require skills or conc epts which e.mnot be taught until 
the individual has reached a certaifi stage of inaturational 
development which allinvs m<MV complex fimns of leanv 
ing. For example, we kniiw fn)m the well known experi- 
ments of Piaget th.it it is noi until alter the age of about 
seven that childnm can think atcortluig to the roles i>t 
formal logic, and then i>n!v it the task involves ci>nca^te 
ob|ects that Ihethild tan stv and handle, Sti inteliigonce 
appears to a^llei-t inaiuiativ>na! gnnvth as ;vt*H as past 
learning experience 

o 

ERIC 



Thus learning and intelligence, although both 
hypothetical constructs u.wi to explain txignitive |H'rfonn- 
ana\ aiv deriveil fiXMn different types of measun*s, and for 
this reanm tlicy cannot be midiiy equatt^d unK»ss then* is 
stmng ovideiKV nf on empirical n^ationship 

What is 'Learning Ability? 

Hie {n>t step in investigating the reK^fion betwet^n leannnj.; 
and intelligence is estiibjjsh that then^ is in tact a sineje 
construct which can tenned 'ieammj; al'^ility', m u hich 
then^ an^ reliable individual dittennuvs which can bt* 
shiHvn tt> lx> rekited to uidividuai dittel^MK'es m tntelh 
gence 

It vuu give a class a set amount ot work in a new rc>pic. 
such as fractuMis, you will Hud after ^ time that sonic 
chiSdn^n have leanivd mi^re iluin otheis even thouv^li ni joe 
of the chiidn:'n had any knowledge of the topic \u stari-with. 
I.q ua! teachmg does not mean equal leammj^. Much eariv 
research on individual differences ni learning wa^ 
iniluencvd by the finding that an equal amount ot lt%nmn); 
exp^nence, or prat:tice. tends to incic^ase difft»n»nces be 
tween jHH^ple rather than tHjuaUse their pertbmiance, h 
vvasciMuiuded that the cliiten^mes wen» due less to pnn i 
ous learning tl^an to M>nie Si>rt i>f i'apacitv for IvMrnine. 
through practice. Ihis capacitv was presunieit \o bo an 
innate charadi^rista .ind was tenned /r?jM/;Hv^> jh^hfu Hul 
n*si;lts of subsequent investigations niu^ lJus snpposott 
lapacitv pn^ved inconclusive. In nm' stud\', students 
had piactict* in se^t^i djfteaMit intt-llectual type learning 
tasks o\'er a ^K*nud of days, l.e.inung was niea.sun\t b\ 
the iimtujnf of gain or irnpnnenirnt in scoies o\ er thi^ 
penod farti5ran,ilvsisof theirgjin saMVsi>n a wide \ .inet\ 
ut t.isks rewaled ni» general facttu . instead, niiu^ sepaiatt' 
tacttH's were tt>und, .ind thes<^ involved rather limited 
cntegones of fx^rfoim^incc NK»n\t\t'r ihe>v factt^is vituld 
not he inteq^n^ted as measures <if learning ^-ut app'art»d to 
be more lIosc'Iv rtlated io peftormi^iue on laiHy narrow 
tests of ability, such as memor\\ visual spatial abditv. 
spuHl. and fvaeplna! abiht\' *^o 'Jien- ^eemed to be ni> 
conunon, unitan f«i.tor o| learning ability' whiih could 
form the basisof a u-iatituiship lx'tv%een ieacnrng «in<.i intel- 
ligenci*. I'urther, coneiations Ivtween inti'Higence and 
amount vt u^'iptxyycrnK'nl sswv gen*Ma!lv insigniticanf and 
oitcn cIoh: to zero. 

A turiher result discouraging iuw iH]iun'um <»t leamnig 
abdity and mtelligencc vvasiiiiSt tat1i>r U>atiings tor some of 



the learning task*i changed between the initial and fixuA 
scorvs. I'or example the loadings on a verbal aimpneh«i- 
<um factor diH^ease<t between initial and final trials on all 
seven U^^k>, suggesting that veibrd ciMnpit»hension ability 
intluena'd leammg in eaiJv trials, but not in lat*^? ones. 

In KiHKhi^ivn. the expennienter expressed iiouot^s alKJUt 
the use M crude dillcrenLe scort*s tor the purjx>seof finding 
a D^inmon learning fa^-tof buf ividuals do not always leam 
at a ^teadv fate the anHuml r>t un ivase in t^ciw with 
pnicttce m^w rsUtiiU' we learn .1! ii fasttn rate when 

new to the ia^k f igurt' 1, us);ig a t\ pu<^l ieaming curve, 
shows thi^ ettivt 




\iau\ \t\imiu}\ t,isk^ h.iViM 'ceihngt.'Ued': bev4)nd acertain 
st igf of .Hijuisiuon pei^piean^ un hkcK to improve further, 
01 ihr ujuire at the task vku's not alknv furthei impnn'e- 
nient 1 lulher. changes in uiwiividual differences as a result 
tn' pnictu c nught bi^ a tunctuMi ot change*, in the way a task 
1^. performed dunng the coiirse of practice. Anyone who 
lia^ k anu il suv h a ii^nplev skill as driving a car w^ili know 
Chj( main actitnis whith ivqinn^d a guMt deal of concen- 
haird 4 i^'ordin^jriiMi m iheeath' sfagi-^ nf UMniing Ivcome 
ifUHMsinglv jut<>ma?tv wilh practice, alknving iittention to 
b«» di\'i ried to coping with mote ditficult as jvt^s of the task 
sue! .ts driving in heavy traffic Si in studs'iag learning, it 
IS esst'ntijf that a stable behavioural baseline, basi-d on the 
tndi^ idiial's prii>r kMming hisf(n v. and relevant to the task 
at liand, bv established. 

lo a\*nui some these pn^blems, psviholt)gist<» have 

97 



vattiftif^ed to devise tasks which may apfHE^ar to be somc- 
iii^iat ditiiididy but are assumed to ^ relatively indepen- 
dent of prevjk)U!» learning so that evetyone starts hr>m the 
>iMtme base. Although the learning of such materia! as non- 
sense S)4lable$ may seem far removed from clans nnmi 
leaming, it seems likely that similar sorts of copiitive prxr 
€!6sses mav be involved. At the same time the devebpmcnt 
of moiv awplex fitatiHtiral techniques ha*» enabled ex 
perlmenters to take inlo account nations in individuals' 
initial states of learning, as well iis changes in the learning 
curve during the worse ot an expi^nmcnit, thus impnn'ing 
the chances of obtai n i ng more reli able meas un.»s o{ lea nii n g 
ability. 

Shidies of Learning and Intelligence 

Emfdoying some of these inipnncd techniques, some 
studies in the IW^V attempted to establish a smgle factor ot 
learning «ibihty which could be related to intelligiMue Otu* 
study wt out to investigate individual differences among 
children on twelve learning tasks, including woal match- 
ing, ma;fe learning, menior\' tor words, listening i:om 
prehension, pictun* matching, and number- pattern mem- 
ory. Learning cun i*s were established for each task, baseti 
on average |X*!lonna!iu*, ^ind measures of learning wen* 
obtained by comparing each individual's pertomiance 
with lM>th the amature of the avenige itming cur\'** .md 
the point at which tliis curve iLittened out A batten^ oi 
refen*nce tests, including Pn?nar\' Ment.^l Abilities {t^sts, 
♦heOtisgmup intelligrnc*' tf»st, and Stanfoni AchirvtMiient 
Tests, was given, and inter currelitions calculated for all 
measures. Both measuR's derived trvm the leanimg curve 
correlated pi>sitively with uitelli>;ence and achievement, 
supporting the belief that iutelligence involves ability to 
team. But, in a factor anal vsjs, still no single learning factor 
appeared. Instead then? weR* four st^parate fai"lorsi twi^ 
memorj' faciors, a numenca! faitor, and a ctmcentratjon 
factor. 

A further aHtMiipt hi {>bfafn a t !t»an»f pitfurt* of \hv rt*- 
lationship between learning and intdligence was made bv 
Duncansim in using a smaller numlx*r ot tasks in 

which he systematicailv vaiievi the contt^nt. llnw tvjx^s of 
learning taf.ks, concept tormatu>n, paired associates 
(teaming ap^^anMitly unrelated pair^), and rote niemors . 
werecombineil with thav tvpes of material verbal, m^m 
erica!, and figural, to give a tc^tal of nine tasks. Thi*se wen' 



given to 102 sixth grade cKildn*n who were also given a 
group intenigence test, schiKil achievement tests, and a 
selection of marker tests fn>ni the Kit of Rete^nice Tests for 
Cognitive Factors. L^tMming measun's Mreaih sndividual 
wvTC obtaineil from their learning curv-i'S. 

Duncanson's findings can be summarisni as follows. 
With the exception ai amcvpi tomiatfon, whicli appeaixni 
to fonn a fa^^tor on its own. U- mimg was related to intdh- 
genvT. scholastic achievement, and to the rc(e!x';ne tosts. 
I he tasks were tound to l^*, iti gent^ral, related to abihtit»s tn 
a way that was appn>pnati% tor i*xanipU% tasks invoicing 
WiudN were related to \'erbai tesLs, those invoh'itig num 
hers and tigunrs wen' Related to ntnvverbal abilitv. and 
jxim'd asMH jates and n^te memon' tasks wen» ivlateii to a 
memor\' faction But then* wen* also learning tat tors vvl^u h 
wem indeptnident o\ test scores as well as ioncepf tomia 
tu>n, thffv wvn* also separate verba! and non- verbal factors 
which wer^^ distinct In^ni the cories|.uuuhngabil3iv factors. 
Ihr most enu^uraging ct^nclusjon ht>m this research was 
that, although no single gi-neral U»aming fact^n appeanni, 
learning tasks ct^ulti Ix' rediucNi to a smaller rnmiber 5»t 
tai*tors, and some oi these tartur' \vcre related to mcasuros 
of intelligence. 

Several iv^^earcfu r- have atlempti^i In t-^Mablish a ro- 
larion^hip Iniwet n intelhk;c!U*e and learning bv loniparnu; 
how uvll ni>nThil and n^taided ptH>p!e pertonn \anous 
k'aiTiing tasks Zeaman and I louse reviewed the evuJence 
fnnn several studies. In ss\ studses thev found no evidenie 
o{ ii relafjojiship between intelh^^ence and acquisition oi 
simple, classiiTallv ionditiiincd fi's}xMi^t»^ \vc all Irani 
just a»- ta**t when thi; task is a^ sinipk* as resp)nding U» a 
dinner WW. But i>ut oi eighte^Mi stUii^es ivlating intelligence 
to simpk^ discriminative leaming. for example. kMming 
which ot several l>eUs mt?ans that dinner is rtMdy, which 
une HH-ans \hv telephone is tinguii;, and so on. twelve 
n*ported |^x>sitive ri^ults, with the mure mteiligent subjects 
kMming mi>re quickly \ine studies reported no reliable 
diften^nces m learning bv poi^plt" ot various intelligence 
levt^is the overtappinti; llm*e rt*j>*Mt4'd '>*>lh posiiiK«*and 
negative rosults* in general, tho^o studu*^ which finmtl a 
jH^sitive relationship betwtrn k^mtin^; and intelligence c<v 
V4.'red a wider range of intelhgtMice levels. ><> that anv 
diffea'nces in learning Wfiuld K» mnre likely to show up 
But the major diffen*nce l>t*tween stiuiies giving pi>sjt!ve 
and negative results was in the difticiiUv u! the U*aming 
tasks ustxt. Six of the studies pniducm^ negative results 



used tasks which W4*re either too easy or toi^ hani. For 
example, some required only very simpte two-choice dis- 
cnminati<^ns, another used a seven-item , non-verbaL 
pain^it associated disi rimination task which was too 
diftiailt tor both normal and retaided subjects, 

i )ver the range ot studies ot learning tasks, >!4?aman and 
House amduded that at least a low potiitiw correlation 
exist> between intelligence and 'earning, pnwided a wide 
range ot intelligeiuY is s*5mp)/ i and tasks of intermediate 
difficulty arv used. An exami/ vtion of the learning curves^ 
of bright and dull subjects on 'isual discrimination pmb- 
kmts suggested that the essential diffemice between the 
two gn^ups was htm long it tix>k tor improvement in 
fH^rfurmance to begin, rather than the rate of impmvement 
4Mice It started This seemed to be associatini with differ- 
ences m attention between the two gniups. 

Towards a Unifying Theoiy of Learning and In- 
telligence 

One of the problems with studies which have attempted to 
relate learning to inteiligtmce is that they have Kicked any 
thiVretical iianiework to suggest which learning tasks 
shouki be selected for study The review by Zeaman and 
1 louse, leading to theconclusiim that at*^ ^ntion seems to be 
a significant tacior in detenninmg differences in learning 
pc^rfomiance l>etween gmups differing in inteUigencc, 
pnn'tties the beginnings of such a framework. Another 
usi*fui thforv assigns a centra! a>le in the developrr ent of 
intelligence to the individual's ability to transfer prtvious 
learning to learning ui a new situation But mote research is 
nt^eded into the con^mon elejnents of learning and intelli- 
>;ence. ReceiU reseaivh has pointed to two possit^le direc- 
tions Firstly, we can look at tradititmai measures of per- 
fi>nn»tnci in the aa^as of eith*:*r leaniirig or i intelligence and 
attempt to characterise that performance using concepts 
traditionally applied to the other area. Secondly, we can 
hypothesise common - ^Kcsses under^ving both leamii ; 
and intelligence and ./k fur empirical irlationships to test 
our hypothest*s. 

In the tirst tyjx' ot research, the simplest appmach is to 
!>e>;in with intelligence test items and the pc»ssible reasons 
for success and failure <ni those items, tstes illustrated this 
.fppnMcb hv analysing four tvpt*s of test items. 

rhe first, digit span, rei]uin*s succc^ssfu! n*call of a string 
of digits after tht-y have been nwd aloud. After examining 



tilii; task, Estes concluded that success did not nimpiy re- 
qutie good associative memorv, but that much longer 
Idlings could be ivcalled u>>)ng a strategy oi gn^uping tht* 
items into sub*gn>upi» or 'chunkfi' of appn>\iniately three 
digits i?ach, A 54eamd item t>'pe, coding or digit-svntitlH>l 
Substitution, seem^ii to npquiiv both fpod perceptual abil- 
ity and good j^hort-tcrm memor\', so th^t the J^ubject could 
distinguish thea^nect symbol or code and hold it in mrm- 
or>^ long enough t<i n'pmdua^ it in the answer Ki\ Success 
in vocabulary' item??, particularly where the subjtvt was 
i^uiird to produce a detinitiim, sei^niixi to iintilvt' a 
number of nkills which a>uUJ not be deariy separ*;it^Hi, 
Woxd naming, a sub-test of the Stanford Binet in which the 
subject has to pn^ducc as many words as he can in one 
minute, excluding counting or isentences, seemed to bt» 
related to the subject's abilit)' to tvcall weirds in categoric*^: 
those who attempted a simple chain assiX i*ition pniciHUire 
did lesii well. 

An altnost infinite variet%' of intelligence test items can he 
diaracterised in this manner, su^^^esting proeesses in 
doing the test items and in various ty^u's of le^iming tasks. 

The second t>T:^e of teseaaii assumes that teth intelk\^ 
tual and learning pi»rfi)miances can be a\luct\i to a coin 
mon set of pmcesses, stratef;ies, or skills. ITie wrts of 
pmcesties which i;re being investigated aa* span at ap- 
prehet^sion, sjxvd of ir :onnation pnuessing, rate of liwav 
or loss of information, attention of information in convct 
sequence, and speed of retrieviil of inft^nnation fR>m kms; 
term memory. Hie aim of lookinj^; k^r variation iii these 
basic pnxesst^ is not to achieve a l>etter meani^ of classify 
inp people, but to understand what brings aKnit sjx^cific 
kindsof annfH'tenceorini'i>m{.H»{cncc in intellectuai tasks 

Some Conclusions 

Can we then give an answe- to our original question. I><3e5 
intelligence equal leaminj^ ability? The not very satisfac- 
tory answer seem^ to be Yes and No It is appanMit that 
some types of lea.-ning show a higher conelatitMi with 
intelligence than otSeis; the more complex a leaminj; task, 
the more likely it is to be reflated to infelliKi^mv. A summ^^r)* 
of research finding?- suggests how thest* types ot kerning 
might be identified 

Rrstly, learning more highly correlateii with inteiii- 
genre when it is intentunial and the task calls furconsi ious 
mental effort. learning mvolving .Mmple repetition or n>le 

» lOu 



memory may e\Tn l>e negatively lelated to intelJigena^ if 
piwesses ate contrived to inteifene with the n^te liMming. 
In one study, a gn^up ot gifteii childuMTi took longer thtin an 
average gioup to learn a set of verl^aj concepts ause thi v 
expected the task to ih» rr\ow iwnpk'x, and wasted time 
testing 0u\ various hy}x>fhest> iiihtead of uhing purcK 
assiiciative pnx^esses. 

^Uwndlv, leamtng is move clo>ely related to inteIhgviH e 
when theinatenal to !n*k^amt\i is bieranlncal in thi^ st^nse 
that the learning of btei elements dcjvnds i\p(m the mas 
ter\' of earlier elemen»s. Ihe relationslup is also higher if 
the natim* of the k-amin^^; tasks jumnifs tran>ter fn^n a 
diffea?nt but a^lattnJ past learning exfH'at>na% and if fhe 
material to be learned is mcanuigfu! m the senst* that it is 
n^lated in Siime wav to knowledge or expeneniv alrtsidy 
pi>ssessed by the learner. Learning the essential content ol 
a pnisi* piissagc is miu h moiv highly related to intelligence 
than is UMming tlie serial order (^f a list of nonsense sylla- 
bles 

llie relation lH*t\^een learning and intelligeoie is aiso 
higher when learning is insightful when it invoives 
'catchingon' or'getting tht* idea , i^r understanding a pnn 
ciple rather than inen^lv acquiring inknmatMni, 

In addition, the learning ta^k shi^uld be age related and 
ut moderate diffu*ult\' and compk^xitv jn ordertn bv a*!»itrd 
to intciligefwtv Si^me things can k*amed almost as eassh 
by a )'uung child as by an adult, while other toirns o{ 
leannng aiv facihtated by maturation or n.Mdiness'. It a 
task ist4Hu-ompk»\, students at all U:*veis of intelligence ma\' 
resc?rt to simpler pmcesses such as tnal and eiTor learning 

f'inalh , kviming is nu^ie highly cnneiated with intelli- 
ge!ue iit an eaHv stage ot k^'amingsi'^mething new* than it is 
later in the course ot piaclia?. Pnictice make^ a task more 
automatic, and heme kss demanding of conscious efloti 
and attention. 

In our present educational system, general measures ot 
mtelligentv have K*en used to pn^dul school achievement 
with a a^markable degrw of success, suggesting that S4 h<>H>l 
ieaniing pn^bahly fulfills niiiny of fhesr conditii^ns. But 
with the limited st^ite of kmmledge at present of thefund 
amenta! pnxesses involvt*ti in cognitive bt^haviour, it 
might be safer to regard traditit>na} intelligence or aptitude 
tests as iisehi! in pn^dicting the outcomes of learning rather 
than the leamtng pmcess itself. Ihe most iximmonh' used 
scholastiv aptitude tests aiv designed to predict the pR>- 
ducts of learning in a particular st»tting. fhey are n(tt de- 



sigtuHi to predict the ways in which different students learn 
best, ia measurt:' Kisic pnnvsses that underiie vark^iss 

kinds of Icimmg, noi to assess the abilities needt»d to leam 
a new task 



Notes 

M^Hii ihr ^^uppo-ivi 4-,>p.n i*v t Mlk'd lri:»ming abil- 
ity w.^st' ihaUenii;e'i bv \\ Woodanv jn* . 

\\i>odnnv H ' \hv AImIhv tl» f t*.)nr T^'j. Kt'vtnv,53, 
pp 147.. Sh. 

Sinnj' jnvt^t3>vilinns of ihc tvlMum^hip betnt^rn k^immg and 

staKe, 3v ! . ■UMiMin>; !\jran^"'terN, Ap{itudi««s, and .Achieve- 
1 '>in'\ian>ini. 

!>ini< ,^n*.on } V I.t-u^iffu* ami Mej^^unn! Abthtii< ,} mi ffisil of 

f r^M .^vv!. . *c/ l^-^^^ pfv 220 « 

/ranuiM D . ^nd } {tui^t' HJ The Kelation ul iQ »^nd l earning', 

\ ihci^rs !tiU'lhv*,enrt» js jbiiii\ liMran^fej kMnnng was pro- 

Krr>;uvufi;(; \ On i iMmjn^; .ind Hum<jn Al>ilitv\ Cunadiaft 
ji'iiiK.i! ^^f , Vol. S, I*'^.^. pp. W3 112 

.\n HVit'-»lfi;»itit'n Lj} Iht' pussiMi* it'jiNiinv tiu suet ess tir failure on 
!t-.f itt'itiv wa^. pn»f>t?»*t*d bv ^su-s or\v way of studying the 
rcLiiiun^hip brfvM,'i*n U-amin^ »ind inteUi^enkV in 
iste<, \\ K. icarn;ng lfiei?rN ♦ind lnteni>;enir' , American 
f'M;. Vi^l pp. 710 *i 

\he ^omiiium^undi't which k\mV\iu\mt\^ht \\'n*\Mcd ttMOtelli- 
,t;ffui' %\'vtv uk'ntitu'd bv k'nsin in 

k'nsi-n, A.R \,Uiiu- rt lnu!h,:t-fUi' i^ndiy Rchitioft to I teaming. 
Melbourno, Jink l^'cture, 1^77. Reprinted in Mtibeurrte 
S/m/rr*. tn hdui aium, IV7H. 

101 



TEST BIAS! TEST BIAS? 



By Neil Reid and Alison Gilmore, 



NZCER 



Inhroduction 



It is Nm*t*mber 3 and ]onathon Tettey-Jones has |ust Ictt the nyom 
where he has been sitting Schtn^l Certificate mathematics. He is 
angry and frustrated, 'They askcnJ questiiins ainrnt everything I 
didn't s^wot; there were h>pics in this we haven't e\'en touched this 
year!' he mi^n«> friend Nigel. 'It s niH right! The exam was 
unfair; I didn't get a chance to show what I know!' and he ^lamn the 
exam paper intii his schoon?ag in d^pair. 
Acriw^i the city at a large aintributing schmfl Mrs Carew ii^ «iitting 
t her de?»k. She ha^ almcY^t finished tatnilating the n^suttn of a 
."tiotastic aptitude test administered to her cias^!^ b>' the Uval 
temiediate HChtK>I as part of a pren^ntrv' testing battery. The 
on and Island kids are ixtltcMn of the heap again' she notes 
ey didn't have much of a show with ail that difficult vtx ahulary , 
those wctrdy mathematics problems and tough verba! reasoning 
items, especially as some of them don't have much of a grasp erf 
English/ She muses. 'Still, quite a few of ib? kids did 0,K.; some 
pretty high sams tw! Big diffei\»nce between thiM»t» middie-class 
chiMren fn>m Hillcn^st and those fnim the other end of town. 
What yiiu'd expect, I suppt>se, with the test biased in their favour 
and discriminating si> unfairly against the Pi^iynesian kids. ' With a 
heavy sigh, she ck'ises the mark register. 

In the downtown part of the city, Debi^rah Pagent, a university 
graduate in geo?4^y , is being inter\ ie wed by a firm of management 
consultants As part of a ti*st battery she is requiivd to take in 
applying for a particular job is a well-kninvn measurt* of 
nuHThanical aptitude l>ebtirah is nonplussed, She has not ev»»n 
seen some of the mtvhanisms depicted, let alone l^d any 
opp^munily to k*am how Ihey work. 'It's discrimination/ she 
thinks. 'Fine for men who've had *^xperienanvith these things, but 
not for women no %%ay. FH^finitelv unfair; I haven't a hope of 
doing well!' 

Thest* statemenis an* fairly typical of thi>se that can rn? heard 
ever\' day in and amund schiH^ls or other places where tt*sts arv 
given. Are these legitimati complaints against tests? Aa* tests, as 
is claimed, really si> biased and unfair to minoritit^. or to attain 
sulvgniups? Unfortimately, it is impossible to gi%e a straight 'Yes' 
or 'No' answer to these questions. Things are not as simple as thev 
may appear. The whole topic of test bias is complex and confusing, 
partly because there are S4i many defintticms of 'bias' (including 
common, e\*er>'day use), partly because the issues bea^me hi 
i motional for those who see themselves as disadvantaged 
aw championing a cause, and, to si^me extent, partly 
amfusion in the interpretation, or meaning, irf test 

The illustrative statements introducing this articl 
Si>me of the major issues. While fhi*re is uni 
definition of tvst bias, five broad t>T>es of bias i 
amtent bias, bias in language, atmmphert* Nas, 




ittiiili^ect^ aiKl in interf^i^ti&n with sadalconjiequi^vs^ 
The kten&m of this article in tn riarih' the i^cs so that a marv 
ei^f^fiai^ d£i?ati? can tak«* f^^. Atrcordtngly* inlnniuiX* 
th^ese dilfierent kindh 4>f test bus, i^amint? the hakis the ckim?* 
made iigdimi le^ts. iind dtsoiHs^ iv-hether ?Hich dUim> an* fuMiftc^l . 
iit?^KV pwrUuic?* i^xtendi^ di^nrusHk^n. hiit H*%erai readih^ 
^ avaSabii* njh^reme?* an? pmvidt^ »U the end oi the article for 
luithej Heading. 



1« Content Bias 

Mcist daimt* of hia*^ in »i tt**iit cuncvTn its content. IukH nt the lhn»t' 
ilhistratiiimixintiiinN foment*, tif a>ntent Wa?i whii h i> the tyjx* ol 
test bia*i il%£si aimoji readily to mind h»r nii>st ptnph*, 
.A test with biased content vontainn qut^ti«)n<^ th^t in onv svnsv 
or another ate 'unfair' to an identifiable Mih-gmuii ot thoM.' Knng 
le&ted. Exampie^i of tvpiral biasi cited include: words ihiU aa- 
tinkmnvn or unfamiliar; topics that have not bei n cinvred 
adequately or at all. queHlion<» drawing on iv^f^nenceN cnitiside the 
winge 0f tho3>e mmnally expected for an a>?e, class vr minontv 
l^rc^p; emphafiiifi in the ienis on vahie* and ?ikiH?i that an* 
tmlitiunaUy middle-<laf^M Eun»pean, and mi on. The ctmcems an* 
genuine. Are thev' justsAinJ? 

A test b nurm^illy given, in the first place, togau>;e how many ui 
the i|uei^tit9ns it contains can be answenni cumnctly What is sought 
fe mformation about the aWHty being asse^MHi, as ri'\ ealed by the 
an$wen» to the que^^tiims ttot wammH for inability to achii»v e, >uch 
as n?stricted opportunity to learn the content, or failure ti^ take 
advantage that opportunity. Unque>tionably. middle -clash 
European children in mir socit*!)' an* hi>tter titti^d to take the^e 
tests. TTwir more extensi\'e experiemv means* that they have had a 
gneater chance of aitjuiring the kinds of ^^kjIlH and aimpiiennes 
which underiie .schinil perfomwnce and which an* incorporated in 
sdHi!astic*type tv^ts empha*»i/ing verbal and numerical abililk*s. 
But thisi doen not make the tests biajH'd against other students. As 
long as the tests can accurately assies*. pres4>nt attainment and 
predict the school pi*rfomiamv of childn»n, it makes no difference 
fwthe r»/if///vof the tests how the childn»n have come by the skills 
and knowledge. The tests still serw the purj^osi»s for which thev 
have been di'signi'd . 

TumbulK speaking bt*fore a l; S. Semite Sulxon>mittee. in l^f77, 
MW it this way. 

The test score tells you hi>w well the student has mastered the 
skill in question. It diH»s not teli you the obst.Jcles he or sht^ has 
overci^me to attain the degree^ of pnitirienc\\ It one is conc*»rned 
with helping students develi»p a el of skill necesw,irv tt» get 
along in tnJr complex siviet)', ii important tt^ bi* able to 
measurt? attainment separately fmm the questiiin of hmv the 
learning was or was not acquianl. 

Elimimiifiif Contcfit Bia^ 

Widely used published tests are usually carefuUv a »nstructi^ and. 
in the case of achievement tests, tht*y sample content that is 
specified by an accepted syllabus, txirriciilum, textKn^k si-ries and 
so on. In this t>'pe i>f test, content validity' is of paramount 
im{H>rtance. Aptitude tests also sample a domain that is described 
and delineated bv the test maker. In bt>th cases the tests are 
9ub|ected to 'sensitivity' reviews, usually involving membi»rs irf 
klentifiaUe minority gnmps, in an effort to det^xi and eliminate 
ccmtent or questic^ns that aw offensive, unfair, ambiguous or 
inamropriate. Words, phrases or descriptions considered in anv 
way biased an? removed . 

In addition, the tests ha%^' to mt^et rigorous statistical checks, 
arKl trialKng is done with the kinds of students who will eventually 
takctiiem, including those from cultun>s and Kickgrounds cmtside 
the mainstream. Where there is e\idence of particular sub-gn>ups 
pei'fiHiiiiiig dtfferentiany cm specific items, such items are carefully 
scrutinized for possible a>ntent bias. If a st^urce of bias is dettH:tt*d, 
the items are either re-written to eliminate the bias or discarded, 

ERIC 



While it is p«i^!^e to amipo!ie ^twl' tests. refHttaWe lest 
develtipers tr>* to meet stringent pnrfessional standards and 
n-quin^nents, indudtng st^vital tlwt apply to test bias (siH\ tor 
examjile, Stmuknh M? itimatnmai nHti Ps\Ahi^o}(ftiil 7i>fs. 1^74). 
Test makers must bi* constantly alert to fHissi We sources i^f l^as in 
thnr pnKluii:s. tven the appearance ot disci iniinaliim mnn.K to Iv 
can^tuUy avoidiit! 



Ai hh'iHvmnit ami Aptittuiv 

Before prixetniing, we wi!} make an inifn^rtant distinction betwi«en 
achievement and aptitude tt*sts and tln^ interpretation i^t sixws 
made on them. It is a vtiffereruv that arises again and »igiiin and is 
crucial to a clear umit^rstanding of the l^ias issiu^s. 

Ht^pite a pn vailing m>ih to the i-^mtniry , tests i»t pun' aptitude 
do uot exist. SiniT curn'nt aptitude can nm K' measuns^l in isolation 
Irom past achievement, the i\M% t\ pes 45ftest obviinisly overlap in 
content. Hinvewr, then- 1> a great deal of difference in how the 
results of such ti»sts .ire interpn'ted. And it is this aspvt, with its 
considerable Mxial ramifications, that triggers the emuhonal 
reaction attached to test sCiin* interj^wtation, 

A high score on an iichmrrnvnt ti*st can be interpreted 
k^itimately as an indication that the etmlent of the test has t>een 
k'arnt and that furttuT instruction in the art'a ex^nnined is pmbably 
ni)t neci»ssar\ . A low score on the Siime ti*st may be interpreted as 
evidemv of a lack 4if att«3innient and hence furtln'r educational 
effort or input is requirc»d. 

On the other hand, -^m apiituik test, instead of measuring 
attainment, is ustnl in priniicting futim' .u hie\ ement on the basis 
of pn*sent *iccomplishnient. A high svwe \viH bt» interpreted as 
evidence i»f ability to learn now and in the jmmiHJi4ite future*. But a 
low scon- nwy be interjireted as shtnvnig that the pupil has 
insuffident cajMbility iv achieve and in wime cases attain 
inJucationat a'Siiurcesand opjxirtunltie** to learn mav therefore be 
withdrawn. Heme, in thediffert»na»tH*tWivntheinterpn'tationof 
aihsevement tests and aptitude tests liesa deiision of Ci^nsiderable 
SiHial significanav In the tormer. a low Mon^ prompts an incaMse 
in education n^sources to btiA»st achievement, in the latter, tin* 
Hdme low hM scot^* may n*sijlt in n^sourci^ K'ing withdrawn or 
dcnii\i the ver\' rcMurnvs ihM might help to alleN iate the lack of 
abilities i^r competencies. However, to assign pupils to slow* 
learning classes on the basis o\ a lest score alone, io water down 
the currictiltini and to lower exfHVtations on their behalf, w ill do 
n4»thjng to enhance the learning chances of low-siuring students. 

F.iilun' to make this distinction Ivtwtvn achievement and 
aptitude can lead to claims oi bias. Where Maon students as a 
grouj'Jor example, score low on the /'/V/ - Rtuu1myi\ oaihuhiru li*^t>. 
the charge is tme of bias' and may lead to an abandonmt»nt iH the 
tests, rather than a demand for»m imprinement in thetxiucational 
sv .em. Yet when the tests are subjected io cUise scrutiny, Kith 
judgemental and empirical the test 4^mtent is li^und io be a s<)mpk« 
ot W4irds taken from a whole range of nMding material \o which 
children an* expitst*d IxUh inside and outside schoiil exactly as 
one might exjiHx t. The t4»st si-4ires do niit indicate a Kick ol capjiitv 
to achiive. Instead, they pnnide I'vidence that .^dditiimal 
educational ettort. perhaps mon» e»t the smie, jX'rh.ips a itifferent 
apprivich, is rt^quired sintv achie\ ement is lacking. If vini have a 
high fever you don't S4ilve tht* pr4iHU»m bv **mashing the 
thermometer. I he challenge is surely to use the inlormi'Mon as an 
aid f4n identifying the mc^st effivtive inluciititm expiTiences tot the 
pupils. 

Achinvffjcnt LIh^ivIu -n'? 

Mimmt) gr4>upss4>nietimt»s argue that their children i/n- at h!i»%'ing 
in the sch4Hil system, and it's jirst that the tests .ire asking the 
wrong questions and testing the w r4»ng things. 1 he implicatiim, of 
c<^urst», is that Siimmvhere there is a gn>up of right' questiims 
which, if posed, wunild shtnv satisfact4>ry achievement by the 
minority students. And, C4mvers4»}y, we 4ilten have the sanw 
petiple claiming the system has faiU^l thdr children, nting the low 
achievement, as indicati*d by test sci^res, to prove the point! Such a 



is fiislffii^ if d«e qu^H^ ju« imaM and do imt 
f^qxresmtis &alam^ samf^ iif cuniculutn content, what has been 
laujs^, leiitbotik citnh»nt and w on . 



Mmt SiVfv Differences 

It fa} often dainied that pupils tnmi othnk- minoritH^s or Unv Htwit»- 
iscrniolmc status! hinnc^) .uihitni* knvi*r sairos on M^htcwiWt^i «ind 
aptitiilv^ !tn»lN, InvaUHt* Ihc h*^l> arc NastHl. J his \ iv\\ i> mt5tiktMi; 
mean diAwnce*. m tluifiH^hrs are lun 4^ U^j^ittnialo >latidani tor 
idenlitvii^ t^as. li ihvy wm*, as iHirdncr tut athun i^nufnt 
4^t$« wen* *.pi»Htng tv**t wtnild W WasiHl again^^t ptnn^ spi'llcrs, 
4?Vi*ry viHal»u!arv' !t*st againsl ivj-hoiih w ho had j-Hmr vivabuLino*. 
• • « 

In cim^Mdcrhng u hr siu h scow dittenMuvH an* rt^viMUni bv U'*»l^. 
Flaufthrr, in hiv4i>gnit U>7H artit lr on ivsi bu^^, *^tati> ihc pi>situtn 
«ucvim:tly. 

. , . it %V'.n}ld *iurprisjnj; if rvn^i kinds of lesis didtl't shtnv 
mt»andi!!ea'nif*«in !a\uiir oi thr ntajiinty gmupJt would \w\v 
to bi' a pivuliar kind oi tost indood to fail to retlctl Ihi-disparilu^N 
and dittrring advantaj;c^% that an* S4> evident thmugli other 
!HHlatf^. Vot many tritii> of tisHri}^ merge thi* conivpt of 
tfquality of t^yjh^ti unity, whkh is ivrtainlv »^ lej;Hiniato goal to 
!<ifught. %%-ith the amtept ot tn^uahty oi rc*iult: but it is only 
msttlts that the ti^ls in tacl meajiiin'. 1 he e\iNting disca»pani:y' is 
e\idence thiil the legititnate gtvil has not Invn altained: \o 
«iciept the diverepaiuy instead a«* evidemv ot te**t bias is io 
detlivt attention iriim the pursuit ot that tegitimategiHiI 

Cksirly it is the opportunity ti^ learn which is miciaL Since the 
test maker must work on the l%isk assumption that all student*, 
have had rtMsonal^Iv similar e\}.H*nencesand kiekgriiunds Ivlore 
taking the test, when^ students ha\ e Ihhmi denied the opportuniI\ 
to learn Invawsi* o\ restricted Ku kground and or discidvaniaged 
h<tme and or schmtl environments, the gnnip ditlen*nci*s will 
often simpJy retkvt this rather than true dittenMir'*s in abilitv. 
Kmnvledge ot the test taker's cin imistana's must ahvavs be taken 
into^wunt m deciding on a test's suitaWHtv and in interpreting 
the It^t scores <*4.v lOSC A: I i*acher v \iuiuai, p,4l 



Fairth'^s in Relation to U^e 

To implicate the issue, test tnimess in tentis of content un K- 
evaluated only in relatu^n to u^i\ We au» intereslt^d in assi^ssing 
students' pn'sent fx'rti^rnunce on the tasks milking up ihx' lest 
tasks that an' demonstral^K .ipproprMteand signifi4:.intU.e., valid 
ioT the intendi*d purpt>si»), rathiM than the siuirces of xanantv in 
the lest scones (i.e.. the reasims why ^ome students tio noi 
achie\HM. I>outH[lt»ss, those minority and dis^tdv.intagini students 
whose experience and btirkground have m^t i^^uipfn d them with 
the nompiiencies essential ti^r schiH>l learning, will skotv low on 
achievement and schi^lastic aptitude ti^ts. Inst^tai as thirst* tactt^rs 
affect fvrtormana\ their mfluente will and shiuild be dete^ t<d bv 
the tests. However, those l4>vv si^ues will represt»nt not a bias in i he 
test^, but a genu»ne def* iencv m thc^se pupiK ,i Kick ot the 
particular kind of ability Iving m jsun*d fhe ti^^ts tend lo 
accurately di*scrintive ot current acci^mplishnienl«i It is \ itai that 
unjusliHaliie and damaging inten^ices ,ire not m*ide aNujt the 
innate capacity ot an imtire ethnic gmup i^r students irom .1 
particular SES cale;4KiT\- on the basis ot the k>w test scores, 4)r alnujl 
the permammi'x' of such abihtv detidts 

A further aimpHcation is that a content-biased test may bo a 
valid and useful test in particular ciicumstanies. Tongan leamer- 
drh^ers, h^r example, might not have had the same oppt>rtunity io 
team the English languagi s have other New Zealand students. 
So, a driving test that required the reading of rtvid signs written in 
English, such as LOW OVFRfiFAD O EARANCE. BRIDCIF. 
UNI^ REPAIR, NEW SEAL LCX^SE CHII'S, METAL SURFAC:F 
AHEADSLOW DtlWN, and si^ on, would be entirely appropriate 
the test iff i^vtiHisI y neci»ssar\% even thtiugh a sub-gmup of tht^si* 
takhig it might be ds«iadvantaged. Content bias is inevitably hmnd 
up with test use. 



While it 15 true ti* mv that achiewment and aptitude trats cte a 
tolerably fair \ob oi asM's^sing c%*rtain lacets t»f the vast dimiatn of 
human behaviour, it might alsi^ be cUimed that this limited 
spittnim has ti»nded to becotm* tner-vaJuinJ when CiWpared with 
those 4>ther traits and aWlities th*it are nKMsuroii far less welL Mafi 
iis m*ativit>% critical thinking, and valuing. As tme winter ha* 
Sinre we can't mi\3surc> all iH the imj^)rtant things Vve a^i^ider 
what w e can measun* all tm{HMtant' (Maugher. H7N). Attti %vhen 
tests an^ stNL>n lo Iv tapping thesi' valued' skills and abilities to the 
exclusion ot others, the Issia' of Has is quite rightly raised tn thiisi? 
who detect the deficiencies , 

I h4're IS then, a disi reparu v between w hat tlw test makers ckim 
they can nwasur^* valid!-, and reluibly anvl what the public at large 
K^hevei the state of the ail tif cducatiifnai and f>»vchdogkal 
measurement to biv Ihete is a tendency towards uwr- 
interfirettitfitn, and some nHnv,lH*rs ot the .^sses*i,nient entt»rprise 
must take some ot the bLune for this dist^irtuin. Tlieir past clain^^ 
ti^r what thi7 are able to meiisiue have not ahvaj^ txr 
cimser\'ative It is st^nu'thing of a k^p to go triirti failing to corretJ) 
M»lve several verhil and quantitative problems in a piwil-and*^ 
pafu*r intelligentv' ti*st, to being ItK'lied as 'unintelligent". This 
kind of iwer-interpretatiiin of the intent ind the outiisries of 
testing is n U^gitimatt* .ispiHt ot ti^st bias, Obviously, gn-at care 
must In* taken, by b*ith test makers and t'^^st ust^rs, tiVi»\\iliui*e test 
perloniiance within the amstraints of th** test's umtent. 



2, Language Bid($ 

VVhenMS .ichie\ement and sibohistic *iptitude ii^st n*sults may 
pri»vide a reasiniably accurate indicatiim of a pupil's current ability 
to deal with curricuLir maten»ils and to aijv with the knowledge, 
skills tind understandings tlut pi*ntominate in schtiol hMming, 
what is pn^sentixl in tt^ts of the kind mr t commonly used will be 
writti'O in standard English. In this sense they clearly favour thoi>e 
whiHi' mother timgue is English. And, in this n^sptTt, such te!»ts 
will phiiV at a dis,idva:itiige those w hose masters* ot the Lngtisih 
kmguage, tor whatever reuMm, is less than that of tbi»ir age and 
class fuvrs, Bui. it d»H'^ not follow that e test is l^asint against 
these pupils, and that they would l K'lter jf the test was 
hansbted in!i» their first language. It simply means that these 
students have }HH>r English language skills, simiething that can be 
reminJied \or dm^s it mean that the tests should be discarded or 
that the\* ar^* less useful for the purpin^e for which they were 
designed Abtwe all, it dtn's not mean that a child who has had 
limiteti exptrsun.' to English and whi> attains a low scon* is inferii>i 
4^r stupid. Such inferenci's are unjustitied and reprehensible. 

Considerable ri*sf>i>nsibility di'volves on the test user in 
mstances of this kind. ¥or example, hv on ciirth give a PAT: 
Kc*h/^»;v CouijUi ihii^h^fi 7V>/ \oa \'ietnamese student wh4^ has been 
in the country foi three nu»nths? Such a use is obviously 
mappn^priiUe and the test M4«re will i'veal nothmg ni^t known 
alriMd\' 



Appropriiite Lm^uns^c L ei^'ls 

Test Cimstruct4)rs must t*ike caa' n4>t to c4«ifound what they art» 
si-eki.ig tt) measun» by pri»senting their ti*sts in vinrabularv' and 
svnta\ w ell beyond the level of the pupils t4>r whom the tests are 
designed. If such Ciire is not taken, the test would indtvd be 
unsuitabk' for the pnv.ir readers. A distinction must be made here 
between the ridding difficulty in, s#^y, a mathematics test, of its 
mathematical terminohigy (whiLh is entirely appr4»priate) and of 
ifther words appearing in the test stimulus material and questions. 
Essi'ntially, the siiire tni thi* ti*st shi>uld reflect a-curatdy the 
mathematics kntnvJedge and understanding of the test taker and 
not his or her general reading ability. TJie bias intnxluced by the 
unniYessarilv ctimplex language will have a confounding effect on 
the measurement of the mathematical skillr.. The wav rtnind thisis> 
to maki' the measun* more appropriate by miidifying the language 
ot the test. 



ERIC 



3 



104 



Sf* Stawtpping 

\s Flauf;her pinnts init, thtw an* gmit simibritivii in the unfair 
.ie^tim*nf nf vthvk minoritu'ii and %vumi»n. Mow than any other 
^rimjp, women are up aRainsl the tnglish language itself: the 
bi^u^i' hafia di^itinct maHniHne bias, paiticularly in Iht* gem*rk' 
us^ trf mail* m«y ns and pti>iH<uns when the content ii»fers to biUh 
i^efMe^. And this hiaju h» reflected in tcjits as m tt^:itbcHiks^ currioslai 
mai^tiiifi and the MKe. In cither wurd*^, if the curricuhim. printed 
I material^^ iiiu!ararivee\ani(iiesandMHman.^!»ianted toanv extt»nt, 
; tir if the> pn»mit current Ntert*<ttypes. as thev arw' bound ta, then 
- iha^ name biases and iOiltural content wiU be'ivflectinJ in the h^h: 
it the Uv>is an» to sample content and cnrricuhim 
en^ha^e?* repre?*entatiw!y and auurately a>» achievement h^Xs 
mast. 

it up to the test makerN tti do their utmo'^t li» enniite thai their 
pnidHCisi dif mrt sm e to fH^in'tuate the image oi svxud inequity or 
wise sexlM laitguage when ?t can Iv avoidi^d. Ah l-ias h»*en Matinl 
^bove, te?4t*^ c^nnul lead thi?^ ch. nge, the\- mu?»t rt»t1ect the statu?i 
quo, partkxilarly in terms of content and emphases in achm entent 
t<*s»iing. What can W* done, though, is Ut m*ike iJthers more a\v.ire 
of and M?nNitive to thef^- subtle buisi^s and lo eliminate Lmguage 
aitd,«%tereotypi*«» olh»n«^ive fi» iviimen, 



3. Test Almosphere Bias 

Ciaimh <^re !ii)metiitu> nude that typic.il achievement .md aptitude 
iejitfi umierr>limate the ai tual p**rtomumce ot mint»ritv gtoufv». 
becauM* the tent takers react negativ eh to cert<iin asptn ts ot tin* 
testing situation, i^hildren fnmi these gnnips. it is said, obtim Unv 
jftcoivs. not Kvause thev lack the ahiiities nieasun*d hv the tests 
but ht*causi* ol nonH:v»gnitH'4' lactors. ,is vtisiussi>d K'Unv 

IntemctiiVi Lffci tii 

One of the tacti^rs that rm-ived cimsiderable attetUiiin the 
infCTacti<m that ixcurs between the tester and the j^rMin Ix^mg 
testted. 'it**it adminisira tors »issume thiit those taking the tests will 
be motivated to do their Ivst, and that by i*s!ablishing rapptm .md 
using stiindardi/iHi administr.ili% e pnvedun's lesier t»ttt\is' n il! 
be nunimi/ed, partuukirly tor individuaJi/ed tt^tmg, when* the 
level of inter* liion is high, 

Thi^ critics charge that, while these assumptions niiU hoW true 
formiddle-tHass f un^piMn children, thev usuallv dtMi't hok! Uue 
for thoM* from other suh-cultures or s4HH>-ecommiic levels, lii 
a'meiJy this situntion and to counlenct likely etti*its, thev h*ne 
iHigge.sted that the tester shi^uld always bt» tnmi the same i thnic, 
fiocii>-economir or sex sub griuip as those* txMng testi^d. Apart horn 
the impracticality i>t such a prop<tH,iI. reM'arch into atfuosphe?!* 
bias provides nt^ empuSca! evidence to clearlv suppitrl the cnttcs' 
contentions. Nevertheless, it would he unwisi* to igntire these 
poti^ntial wnmes o( l^ias and tent users shmjid alert to tfu' 
possibility of such effects. 

TlteTc<it Sit ua turn 

Perhaf^ the concern expressed in this asfHH 1 4>f ti^st bias would Iv 
better dinvted towards the wht ile sixial psychi^U^gv ot the testing 
situation? Ma ;1je it is the verv' act oftestingitsi»Jt which is untair for 
Rime persons in that they are unable ta demonstrate their n'al 
capabilities. They might wdl be inhibited when contn^nted bv a 
test because of past negative, hurtf ul evperienci*s, which could be 
commonplace for a»rtain minority gmup*v. 

As test users, we should perhaps ask lurst^hes whether we 
rfiCHiW administer natiiwally normed standardized tests to groups 
irf students $vho. in terms of academic achievement, differ 
markedly from the majority oi children. Perhaps other kinds of 
testing - tailoa^d and/or cnterion-reteivnciHj ~ might bv moa* 
appmprtate and le*is stressful? Of cou^^e, if one of the purpt>ses ol 
me school in using standardized tests is to relate the group's 
achievement to that ot age and class peers nation-wide, then the 



testing of tiH^ students with appn^^riate »tandantteed feKils most 
be conducted; there is no cvAm wav to gain such informaticm. 
Htiwevvr, it dvnrs seem counter-pnHluctive to inflict periodialiy 
Urth thi*se students and their teachers with detailed evidence iif 
just how tar triim tlw natkmal ncwm they in fact an?, and fHissil^ly 
stiutl out anv tlickermg enthusiasm for learning as well ah 
adversely afk4itng teacher morab. As f laughw has stated; 

. - students and ti^achers in |this| sitting know tht*^ kind of test 
bias in a \ erv }h rsonal aftd painful way and understandably are 
ht^stile. ready iv condemn that prociHis arut ti*sting itt^'if as a 
denumstrablv harmful influemv on their lives. 

In such situatii>ns an atniitsphere of disaiuragement and 
dismeentnv is created by the asstn^sment pn>a*ss itself, rather than 
bv those potential stnirtes ot atmosphea* bias identifii«d eaitier. 
With this in mind, t%^t users should its* to chiH?se or to develoo an 
as.st*ssn\ent tivhuique that will minimi/e thi*se etfin is and strive to 
pnn jde testing Ciinditiuns amsistent with the purposi^s of the 
assi^ssment, Easily said ftof so easily dime! 



4. Test Bias in Prediction and Selection 

In a siHietv like ours, in which i^uality of opptntuntty is 
univers^illy acceptoii, the problem of bias in pa-dictiim and 
sekHTtum, whether t^ir admissiiin io tniucational institutions or for 
employment, is a troublesimie one. Cummtly, tests are used 
extensively to pnMict future achievement and piTtormance, to 
Cimtuil entr>' to tertian' and s^x^ialist education, and to help in 
dtviding .^binti suitability or otherwise for particukir vocations. 

A test IS considered iv bv biasi'd, if the predtctitms or decisions 
Kistni on the test scones var>' for diffen?nt gn"*ups. Mattel^ of 
dUlerential validity and pn%lictive validitv' anse, and the 
disiussum nivessarily involves statistical considerations. Again, 
the treatment of this aspect of the topic must be tner-simplified, 
tnit essentially the nub of the issue is as fdknvs. 

If a ti*st is examini*d fitr }n>ss!ble diffi^remvs in pri^ictive validity 
tiir ethnic, sex or other identifiable sub-samples of the population 
for which the lest has bivn designed, and if no significant 
differences Umnd between the gmups, then the same decision- 
making ntfes lan bv ust\1 fiir everyone, n'gardless of group 
meniKTship. Ihe statistical tinhnique employed is most 
Ci>mmonIv the calcubtion oi the relationship between the test 
score and Mime criterion measure, such as Si hiH>l Certificate marks 
(4-K st subfects), first-year university jx'rformance, an inteniew 
rating, i?r the numK'r and value of siles made over a sf^fied 
peniHt of tmnv If the test is predicting 4*qually well for each group, 
then there is no problem. But, if it is not. and the t%»sting pnKedure 
IS deemett io be mvalid for one ot the groups, tt»vn alternative 
avst^ssnu^nf methiuis mvd to be devised for that gnmp. 

Much ov*»rsiMs research com|\iring pn»dictton for whites and 
blacks, largelv from the U S.A., has amcluded that differential 
validitv is mmexistent. and that even if it does exist' it is not a very 
j^»tcnf phenomenon: it is p*»eudo problem. Contrar>' to what is 
oftiOi supposed, sev era) tit these in\ e^tigatiims have revealed tl^t 
li st scont's predict in lavt^ur oi blacks, and that it is persons from 
the dominant cuUunv the majority gmup, who are toeing 
discriminated against! Paradoxically, t ie tests are mH amsidenni 
biased m this cast*. As Silverman ha > ol^st^rved: 'Tests are only 
thought of as biastd when they assign comparatively k»w scores to 
i-tjsi!) identifiable sub-gr4Jups/ 

Biai^'ti Criteria 

Hinvever, then^ are complications yet again. Some critics of testing 
have raist*d a noteworthy funnt. fhe>' contend that a test may be 
demtmstrattHl to be valid for predicting, say, schi^i! suca*S5, tnit it 
mav still bt* unfair to minority or disadvantaged students, tiecause 
the critenon used in the validation study, for example, a score on 
the Tt^i of Sihi^asfic AMittcii iTOSCAf (the pnwiictor) ctirrriated 
with 4'best Schtx^l Certificate subject marks (the criterion), in itself 
biasi^. And that brings us back to the issue of tl^ appr.>piiate uses 



lei tid^ ^ til* railt^ aw jmt, dnd to the socfal ccm^uenccs of 
0^adiofiSM tart tti^m. This aiif^4if tent Inaf) hi^ m:elved f^nt 
ftHiatfion ta the wm^rth Hti?ralua% as Guttiksen has }xiinttni uut, 
it Jire<«ntf» a wmpli?^ pn*li'in that far f nm Mng Milvcd . 

WNf s to be dc«u? wlwn Ifu* retMbUHk^ of thi* predktiir iind thi* 
crilii im aw* <^i^mit far dtffeivnt };rimp*»? It iuh Wvn suf^vsUsi 
fllAl fan ifea?lf {:au($e<i the mean wtm* diffcivncseTt i^fti»n 
Clbiori^ 1)et%^«en maji^fity and mifwnty ^nip?*. Queries abiiui 
effiiivdiciicp naturdiiy arisi*. 

\Vhat do we do %vhen the itiierion bi»iH»d for a niinori!> group, 
y^ilis tf^tthmal, well-tried, iiccvpttd by the vast maiority, and 
CAlumt bi* il>|!ibced by am^htng remoti^y **r gtHxl and it^eful? JuM 
how sound (^nd unbia^ii'd) an? our tradittonai criteria of external 
muninaticTn marksi, supi*ni*tin>" ratingh, intemaHy tKM^sscd 
teacher gr^i*fi. spctni of ivork. find mi on? 

In the United Stiitt*?*, txiurt Kutles *w IviO)* touglU roj;iil*irlv 
0Wt these and j^imilar criterU'' and lbi*ir usi* in prediction and 
^haticm. tegal actien has hev - taken to ^tiallenge aHegi^i 
discrimtnatiuy practice?^ oper;*tinj; in st^k*ftkm tor adn^issioil to 
teiitery ediiratiotiai inf^titutiim««, in empfttyment and mmt u centiv 
in schmtt wstems requiring 'minimal iom^vhnuu^s' of their 
^im^nt^ befan* graduation. ITie n-nult hav Khmi tlie lornnilatio:i o! 
a t94^ie^ irf guidetines and rej*ut»UjonH in an effort to ensure fair 
ta Jitment for alL 

Te?it makers and test nsi'ts need to K- alert to ditferentt's in 
validity for various identitiabU* sub-gr^Hips within our 
hetewgeneouji, tnulliciiltural ptipulatum, and to nuutit) their 
deckjon-making priHri^duri^t aivordingly 



In matters of nMectiim tainiess, u hat we shtnilii guard against is 
any kind of 'double standard'. lake thi^ situation whi^n* two 
ranUklatet^ an* up for s^^Uxiiim. Ihey g.iin uknihitl sv\m^ on the 
prediction test, but an' treated differently aiT4>rdunp to ethnic 
idmtity, with a Urnvr requin'ment biHnp am»pted for the minority 
candidate. While anv 'double standard' of this kind threatei^s 
serimisiy our tn^asunnl principle t^t 'etjualit\' i\\ i^pj-Htrfunitv'. jI 
might be noted that wc mw havi* the comv}M o\ ptisuivi' 
dif»crimination' entering into reii'n! iegislatiiMv as in the 
Zealand Human Rif»hts CoramisMini (\^>77) and Kate KclatuMi^ 
(1971) Acts, 



5. Test Bias as Inappropriate Use with Social 
Consequences 



usiar idiose duty It is to be awaiv erf and fo^Ikmnate4i$H:rimim^tny 
chromstancm. 



Conclusion 

ihe w»t»rd Nas' ttM^f i>i almo -»t ahvay^ used in a pejorative senw; it 
evtiki s. as Ciar^lner notes, . afttHii>i\ even visceral n^ction/ 
As 4*ducators w ho empkn' tests in i^ur work we nivd to he ahm>* 
lutely suri« how %ve'a' defining and usinR the wwd luas'* Wie 
shmiid avoid usinj^ it U^^^Iy , and be mindfui of its vart<nts af^pects 
in ti^t usi^ atid the* tnti^pa'tatiim of h.*!*t sciires. 

Thnujghout the dinirussuin the netnl to be dk*rt to posv«ut^ 
simmrfi of bias in tests has bi^»n anterated. These cautions muiitbe 
taken seriously by all educators involvi'd in testing on 
pn*tt^sii>nak ethkal ind lej^al gmunds. But, equally important, as 
the AHA SUimiariis wain, is to a old Mving bias when* mme is 
pn*sent. As ttn^t nsers, let us avoid chasing elusive and pt>s!al% 
imaginary bt^geymen, 

I he issuw an* complex and ctmfused . It is alsci doubtful wh^ber 
all the fach^rs that Usid to bias have yet btvn identified, much le» 
understo4>d, aintnilled for, or comvted. Although psychometrfc 
ti-chniques axv undiHibtedh' improving, then' is far fn^m univiersal 
aga^ement as to th^'ir mathem<UH'al ek^gance^ application and 
usefulness. And a purely technical absolution of the many 
problems test bias is ^n|J^ ely to W adqua te as value judgements, 
ethical civnsiderations and li»gaJ aspects are involved right dinra 
the line. In the kmg mn it is tar nioa* likiHy that thene' matters, to 
paraphrasi' MerciT, will be stHtled in the fHilhtical aa*na rather than 
in die halls of academe, a-ga^ttable as tliat mav be. 

On the tHher hand, while we h.»vo indicated that the testing 
lraternit\ is striving grappk* with the i-4intentit^us issues, we 
should Iv mindful of the altematives to achievemwl and aptitude 
testing. Robert l-bel comments on what sitme of the siicial 
amsequences oj i;of testing mighi K\ 

F>cellence in pntgrams 4if tnJucation winiU bcaime less 
l*7ngible as a g4»»il and k*ss demonstnible as an attainment. 
I ducatien oppi?rtunities would extended lesson the basis of 
aptitude and merit and more cm the bast^ of ancestn' and 
influence; siKial c^:ss Luriefs would become less ^vrmrable. 
IVcisions <ni importani issues of curricuium and method ivould 
be made le*^s on the basis ^^^ solid evidence and moa* on the 
Kisis of prejudicv or caprice. 

Ail thuigs Ci>!)Hidered, nu»vbe we sjunddn't burn the bam to 
catch thi- nioUM*' 



Reschley has neatly summan/ed this import «ni a^f^ 1 1 i\i Ws\ buj*. 
He states: 

The ultimate cnteria th*jt Nhould guide i»ur i vaiuolion o\ ti ^t 
bias aa* the implications and ontcomeH \vs\ us<> tor 
individuals. Succun tly stated, test mm.* is tajr it Hie re'^ults ,\xv 
moa* cth»c"tive mterventums leading ti? improx 4s.l ctmqH'tencies 
andexpandeil up[n>rtunit»'s tor individuals, l est usiMsuntairit 
opp4>rtunities aa* dimmivheii or it mdividiials arr e\p<>s4*d u\ 
ineffi*ctive mter\ention as a result 4^1 U'sis. 

A classic exampk* irf an unit ir msv i^t test a'sults Wiujjd iv the 
as^grunent of a child to a slmv-learner class 4m tlie l>asi> 4>f a smgU- 
test sctm*. Uiisupp4mixi by <tther evitfem^* it %vas |tist thi*. kind tif 
abu«e that a»sulted in a Californian ctnirt h^rbidding S*'n FrancisiO 
Schod District psyi biologists the usi* of individtial itefligenie 
teflrts for the assignment of mincmty students io ciassi»*.» tor the 
educaUe mentalJy a^larded (Ca*HM»t I arry W et a! \ s VVilstm Kile-^ 
etak 1972). While Rescniey taki*sa bmatJ Mew that encimipasses 
additional issues, fair usc^ 4if tests are ifearlv th4>si» which f4Mer 
individual deveUipment, whereas unfair '>nes hinder that 
cfevelopment. Simple and dia^ct! 

Dariington, taking a slightly different tack, reminds us that a test 
is a toot and as such is not a twJ de\-jce per ^\ ' ... it Is the 
particuiar use of a test, not ttu* test itself, w^hich is fair or unfair/ 
The burden of a^spon.itt^iiity shifts fatm the lest maker t4> tht^ tesi 



References 

I hi* >t*iU'mrntN bv W \V I urnhiil}, Jiumi ilv pri sidcnt .»! TTS, IvfiHind 

I'r<»h'ssuinal nidlvfr^ sMntlords .ind nujOfn'»n4*nts ari» hillv diMUSSed 
in 

Xnu'ruitn IMi hol««j;u-.n AHvtHMtu>n. Amtwjn l iitH ation.il KiSc^anrh 
A>MVf,ition and Xdtum^l C ouncil on MiMsua^mrnf in I'd matiim (1974) 
SuvuUiTiis uh IJiuatimal aihi P*>UihoUx,ial I oh. Wa^hingtiHi, IXJ: APA. 
1 hi' t^^n quitutiims tr4»m C^r4im>r a>me tnmi 

lh<*4|u<«u^!Mins ifum f LnifrtuTian tv lnumi in 

thi i|iii»i.itifm fn>m Sif\<Tm.»n c.ifi Uniiu\ in 

SilvrfiTiMn, H ( li^st hii- *md .ihihfv \v UM>n>; U*tttniil af sfhk^ 

i lulhksi'n N itmimi-nl- 4»n bMM'd * fiu*ri,i «.m ts' h»uiuj in 

tHM7t^ im rnmvnm. \I t IS 

f h njtfl.ifnm fr*mi ki'MhlfX i.in hi- Unmii in 

Kt'MhIi'v. I> t |!*?7S} Sii^f ha^'ti Avv-^^ii »?/ IJSfs Mitim-s, I41H11: Mati*4»f 
lona Ik»pjrtm4»nt 4»t VubUt leHtniiifi»n, p. 11 



Th*? WiM ihdt fHflifkia] atnmdiTdlnmjii, m»t kvhnii^^l unes*, mil pnHvHv 
IK 

Pnnivtun. \} H^. pp 142 141 
AhhtH*^ VVdlmgton. \/i't R. 



Further Reading 

Ch\M\\ T A . Ifumphrevs f , Kvndmk. S A jnd WV^nwn, A. ilMr'^j 
f dtK.itNm.i] UM* \*i tv^\% with dtsid^.mt.ij^iui Ntiuii^nis Afffmniff 

I in a siimnian u! u^sUu *»tU\iN .i*.* ,111 ii»»fH»it 0I bwv. mh> 

Cim/iam>, W and V*mvi F t f mZ). Kjtvorfx.imim ri'ihH Naiid Ihv 

/im**, ! and Uilli,im»., I» A h^»k at omtrtit Has ni IQ ti-st- 

fhi'M^ artuirs nifrr uniimnii and disi-ussinn on hia*% in pTiHiHtH>!i 
M'U*» turn 

I inn. H I AdniJHsions it s!in^ im tnaj !v;rf/uf'f P^mhi^!iwt>^t 

pp 27^ 2*^1 

pp. S4Kh^r 

I tH?f artick'^ pnnidm>; amtraHtinj; vu-u <ot VtU iM and **ih nvi'i »>nimik bhH 
in intHfi^i'mf* U'\t^ 

C*4»rdniv R.A. and Riidt>ii I f (Nr^) Rid ninvs Mimrmit.^ tvsis 
Si^ ji'A'VV*'^ ^ iiiii iifuvi, ^2, pp 174 im 

C tuU'f m.uv S S i\^7^) IQ iists in n"^\3fih itn Mn ul Mra!ituaiu»n !hv 
in»>N vLiss validity 0I U^tna** mcaNurrsiif whi^aHiK aptiliuiiv \hff*h*'^y 

h'rwn. \ R \\^7h) Ivsi has ,?nd iimsfriitt \ alldlt^ IMf.i k^^rn^'^ 

^Si-^), pp. X4n y<t 

Ir4»lma«, I k il^77), K^t »ittd tin- muklii-tla^v /.'/if/n?/ /i» 





THE BEST"" 



Contents and Commentary 

Cedric Croft 



Measurement and 



Reporting 



Assessment Issues and 
Measurement Concepts 

Ovenriew of Issues in School 
Assessment 

Barry McGaw 

Achievement Tsst Scores in 
Perspecth^e 

Bill Tumbull 

RHindations of School lesting 

Cedric Croft 

Test Evaluation Sheet 

Cedric Croft. 



Assessing What They've Learned 

Warwick EHey 

Criterion Referenced Ttests 

Glenn Rowley and Colin McPherson 

Investing in Item Banits 

Nell Reid 

Combining Scores 

Alison Gilmore 

Evaluating Writing 

David Philips 

Observation: Tl>e Basic Techniques 

Bruce McMillan and Anne Meade 



One Extreme to the Other: 
A Report on Profile Reports 

Graeme Withers 



Assessment, 
Abilities and Culture 



Non-Vert>til tests in Schools 

Cedric Croft 

Does Intelligence Equal 
Learning Ability? 

Jo Jenkinson 

Test Bias! Test Bias! 

Neil Reid and Alison Gilmore 



i 



